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(57) Abstract: The present invention concerns s-ship promoter and developmental decision promoter compositions and methods 
of using the promoter. It includes polynucleotides, vectors, host cells, and transgenic animal including a developmental decision 
promoter, for example, an s-ship promoter, controlling the expression of a heterologous nucleic acid. Methods of the invention 

nt T PreS r? 3 heter0l °8° US nucleic acid is a tissue-specific, developmental-specific, or temporally controlled 

maimer. Other methods mcludes screening methods and therapeutic methods. 



WO 2005/090559 Al Hi I! II ; II IMI 1 1 llll H!|i I f 1 1 il, 1 1 |i If | j||! 



European (AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, 
FR, GB, GR, HU, IE, IS, IT, LT, LU, MC, NL, PL, PT, RO, 
SE, SI, SK, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, Gn[ 
GQ, GW, ML, MR, NE, SN, TD, TG). 

Published: 

with international search report 



before the expiration of the time limit for amending the 
claims and to be republished in the event of receipt of 
amendments 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCX Gazette. 



WO 2005/090559 



DESCRIPTION 



PCT/US2005/008977 



METHODS AND COMPOSITIONS INVOLVING S-SHIP PROMOTER REGIONS 

BACKGROUND OF THE INVENTION 

The present application claims priority to U.S. Provisional Patent Application 
60/554,318, filed on March 18, 2004, which is hereby incorporated by reference in its 
entirety. The government may own rights in the invention pursuant to grant number CA82499 
from the National Institutes of Health. 

1 . Field of the Invention 

The present invention relates generally to the fields of molecular and developmental 
biology. More particularly, it concerns methods and compositions involving s-SHIP promoter 
regions, which can be used to promote transcription in particular cell types and at particular 
times during development. 

2. Description of Related Art 

Stem cells have been the focus of tremendous interest in recent years because of the 
progress made in developmental and molecular biology and the promise of therapeutic 
applications in a wide variety of contexts, from heart disease to diabetes, and cancer to 
Parkinson's disease (see generally Abbott et al, 2003; Daley, 2003; Hirai, 2002; Kondo et 
al, 2003; Nakano, 2003). Toward fulfilling this promise, many researchers have engaged in 
extensive studies to characterize factors and pathways in stem cell development and to 
evaluate candidate therapeutic and diagnostic agents. Such agents include proteins that are 
gene products, sometimes heterologous, in the stem cells. The ability to express a transgene 
in stem cells is critical for providing data toward these endeavors. The study of genes 
normally expressed in stem cells has yielded not only information regarding the 
developmental, cellular, and molecular biology of these cells, but also useful tools for further 
studies. 

Pathways involved in stem cell function include the protein phosphatidylinositol 3- 
kinase (PI3K), which becomes activated through cell surface receptors. PI3K is involved in 
the generation of phosphatidylinositol 3, 4, 5-triphosphate, which activates signaling 
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pathways leading to cell proliferation. The SH2-containing inositol 5 '-phosphatase (SHIP1) 
removes the phosphate group from the D5 position of phosphatidylinositols, which is 
considered an significant feedback mechanism on cell activation for hematopoietic cells 
(Lioubin et al, 1996; Rohrschneider et al, 2000). 

5 A form of SHIP 1 lacking the SH2 domain has been identified and referred to as stem 

or short SHIP (s-SHEP) (Tu et al, 2001). Tu et al. found the protein contains amino acids 
encoded by exons 6-27 of SHIP1 and that it is expressed in embryonic and hematopoietic 
stem cells. It was initially unclear how s-SHIP protein was produced from the shipl gene. 
Kavanaugh et al. (1996) suggested that SIP-110 was a spliced product of SHIP1; however, 

10 Tu et al. (2001) proposed, based on its cDNA sequence, that it was transcribed from a 
promoter within the SHIP1 gene. This was inferred from the fact that the first 44 nucleotides 
of the s-SHIP cDNA were at the 5' end immediately before exon 6 of SHIPL These 44 
nucleotides were not contained in the SHIP1 cDNA, but were identical to the 44 nucleotides 
of genomic shipl intron 5, immediately adjacent to exon 6. However, no functional evidence 

15 for an s-ship promoter was provided. Therefore, while a promoter with the tissue-specific 
expression of s-ship could be valuable from both a research and therapeutic/diagnostic 
perspective, further investigation of the s-ship gene was required to identify the promoter, and' 
characterize any tissue-specific activity. 

Moreover, until now, a promoter providing the developmental-specific expression of 
20 the s-ship promoter, which includes expression in stem cells, has not previously been 
described. Thus, the present invention addresses these issues. 

SUMMARY OF THE INVENTION 

The present invention concerns methods and compositions involving a functional and 
25 isolated s-ship promoter. The invention includes nucleic acid molecules, host cells, and 
transgenic organisms having an s-ship promoter, as well as methods of using the promoter for 
transcription, expression studies, stem cell analyses, and therapeutic applications. In addition 
the present invention relates to methods and compositions involving an isolated and function 
promoter capable of directing transcription in a) adult and embryonal stem/progenitor cells 
30 and in b) in adult and embryonal stem/progenitor cells that have differentiated to the point 
where the promoter directs transcription only when a developmental decision is required, i.e., 
when the cell is in the resting or growing phase, or the transition from the resting to the 
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growing phase. This promoter will be referred to as a "developmental decision promoter." S- 
ship promoter regions discussed herein constitute a developmental decision promoter. Thus, 
it is contemplated that embodiments discussed with respect to an s-ship promoter can be 
applied more generally to a developmental decision promoter. Consequently, the present 
invention covers those embodiments with respect to development decision promoters. 

The present invention concerns an s-ship promoter and its functional derivatives. The 
term "promoter" is used according to its ordinary and plain meaning to a person skilled in the 
art of eukaryotic transcriptional regulation. The terms "s-ship promoter" or "s-shipl 
promoter" refer to the nucleic acid sequence from the s-ship gene that is capable of 
promoting transcription of a nucleic acid sequence that is connected to it (downstream). 
Transcription can be assayed according to any number of ways known to those of skill in the 
art, including, but not limited to, an expression assay using a screenable or selectable marker; 
ribonuclease protection assay (KNAP), RT-PCR, and in vitro transcription reactions, all of 
which are well known to those of skill in the art and can be implemented using commercially ■ 
15 available reagents and protocols (see generally, Sambrook et al, 1989; Ausubel, 1992 and 
1994, . all of which are incorporated by reference). 

Compositions of the invention include isolated polynucleotides comprising an s-ship ■-■ '. 
promoter capable of promoting transcription. SEQ ID NO:l is a 102 kb genomic mouse 
ship] sequence. In certain embodiments, the s-shipl promoter comprises, or has at least or at - 
most 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 40, 50, 60, 
70, 80, 90, 100, 110, 12, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 
260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 
440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610,' 
620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 
800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 
980, 990, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 
2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700, 
3800, 3900, 4000, 4100, 4200, 4300, 4400, 4500, 4600, 4700, 4800, 4900, 5000, 6000, 7000, 
8000, 9000, 10000, 11000, 12000, 13000, 14000, 15000, 16000, 17000, 18000, 19000, 
20000, 21000, 22000, 23000, 24000, 25000, 26000, 27000, 28000, 29000, 30000, or more 
contiguous nucleotides of tire ship! gene, including SEQ ID NO: 1-23, or any range derivable 
therein. In specific embodiments, s-shipl promoter includes one or more of the following 
regions of SEQ ID NO:l: from nucleotide (nt) 49485 to 61006 (11.5 kb-GFP construct); 
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49485 to 5711 1, which is 7626 nt (7.6 kb construct); from nt 49485 to 55810, which is 6326 
nt (6.3 kb construct); from 54807 to 61006, but lacking 57109 to 57944 (6.2kb-GFP 
construct); from nt 51389 to 55810, which is 4421 nt (4.4 kb construct); from nt 52199 to 
56423, which is 4224 nt (4.2 kb construct); from nt 53820 to 55810, which is 1990 nt (1.9 kb 
5 construct); from nt 54755 to 55810, which is 1055 nt (0.96 kb construct); or from nt 55668 to 
55810, which is 142 nt (44 nt construct). It is further contemplated that the lengths of 
contiguous nucleotides discussed above can be applied with respect to these identified 
regions of SEQ ID NO:l, as well as any other sequence disclosed herein. 

Moreover, any of these lengths or regions discussed in the context of SEQ ID NO:l, 
10 apply to the corresponding regions of SEQ ID NO:2 5 SEQ ID NO:3, and SEQ ID NO:4. SEQ 
ID NO:2 includes the genomic sequence for the mouse ship gene from exon 5 through exon 7 
(exon 5, intron 5, exon 6, intron 6, and exon 7 inclusive). SEQ ID NO:3 is a mouse s-ship 
promoter sequence that includes the 560 nucleotides upstream of exon 6 (in intron 5). SEQ 
ID NO:4 is a human s-ship promoter sequence that includes the 560 nucleotides upstream of 

15 exon 6 (in intron 5). SEQ ID NO:5 is the mouse s-shvp promoter region in the 1 1.5 kb-GFP 
construct. While these s-ship promoters provided are from human and mouse, the invention is 
not limited to th^se.. species. It is contemplated that mammalian s-ship promoters , ; are 
. , . contemplated, particularly those with homology to the sequences disclosed in the application. 
SEQ ID NO:6 is the sequence from intro 5 of s-ship that has a p53 family binding motif 5'- 

20 ATCTTTGCCC/GGGGCTTGTCCT-3 ', meaning that members of the p53 family of proteins 
have been shown to bind to sequences homologous or identical to this sequence. SEQ ID 
NO:7 is a sequence from the s-ship promoter that is homologous to a Pax8 binding sequence 
(5'-CACT/AGAAGGTT-3 '). SEQ ID NO:8 is a sequence from the s-ship promoter that is 
homologous to a Smad 3/4 binding sequence (5'-GT/GC/GTGGGCCAG-3'). SEQ ID NO:9 

25 is a sequence from the s-ship promoter that is homologous to a Stat 1/5 binding sequence (5 '- 
TCAGGGA/GAG-3 '). SEQ ID NO: 10 is a sequence from the s-ship promoter that is 
homologous to a GATA/Lmo2 sequence (5'-GTGC/GCCTATCT-3'). It is understood by 
those of skill in the art that the convention of using a slash (/) indicates two alternative 
nucleotides at that position, where the slash separates the alternatives. It is also contemplated 

30 that sequences of the invention may include one or more of these consensus sequences for 
these binding sites. In certain embodiments, these motifs may be repeated singly or in 
combination with one another. 
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Moreover, it is contemplated that the promoter contains enhancer activity. In some 
embodiments of the invention, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 12, 130, 140, 150, 
160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320,' 330,' 
340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510,' 
520, 530, 540, 550, 560, 570, 580, 590, 600 contiguous nucleotides, or any range derivable 
therein, from the region including the first 600 nucleotides upstream of exon 6 are included in 
s-ship promoters of the invention. It is also contemplated that segments that are at least 80%, 
85o/o, 90%, 95%, 96%, 97%, 98%, 99% identical to any of these regions or capable of 
hybridizing to the complement of such a regions are contemplated as part of the invention. 
Such segments may further comprise sequences in intron 6, exon, 5, exon 6 or other regions 
of the shipl gene. 

It is contemplated that functional derivatives of the s-ship promoter also contemplated 
by the invention. Functional derivatives of an s-ship promoter may be at least 80%, 85%, 
90%, 95%, 96%, 97%, 98%, 99% identical to the polynucleotides discussed herein! Such 
derivatives may also be characterized by any of the lengths of contiguous nucleotides 
discussed above. Moreover,polynucle.otides of the invention include those that are capableof . 
hybridizing to all or part of the -recited lengths of :SEQ ID NOs:l-10 discussed herein, 
including those particular regions Recited in the previous paragraph. Such polynucleotides 
may be capable of hybridization under high, medium or low stringency conditions. 

In specific embodiments, the s-ship promoter is capable of promoting tissue-specific 
transcription. Transcription may be accomplished, in some embodiments of the invention, in 
skin, a hair follicle, cornea, embryo, gonads, mammary gland, pancreas, and/or vasciilar 
smooth muscle. It is also contemplated that transcription may be achieved in cells qualified as 
or with characteristics of stem cells, which may or may not be derived from skin, a hair 
foUicle, cornea, embryo, gonads, mammary gland, pancreas, and/or smooth muscle. In some 
embodiments, transcription is achieved in a hematopoietic cell in a tissue-specific manner, 
including in hematopoietic stem or progenitor cells, but also in more mature or differentiated 
cells. 

It is specifically contemplated mat the s-ship promoter directs transcription in a 
developmental^ and/or cell-cycle-dependent manner. In some embodiments, the s-ship 
promoter directs transcription in stem or progenitor cell that differentiates to a point so that 
the promoter no longer provides expression at all or provides expression during certain times, 
such as when it is preparing for a growth and/or developmental phase. As discussed above,' 
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the invention concerns developmental decision promoters such as s-ship and embodiments 
discussed with respect to s-ship can be applied respect to developmental decision promoters, 
and vice versa. 

In some embodiments, the present invention includes a promoter that is capable of 
5 directing transcription in cells that qualify as stem or progenitor cells and/or cells that have 
undergone some differentiation but are not terminally differentiated and that are not in a 
resting state. In some embodiments, the invention provides isolated poly nucleotides, 
expression cassettes, vectors and host cells comprising a heterologous nucleic acid sequence 
under the control of a developmental decision promoter. 

10 In these embodiments, the developmental decision promoter is capable of providing 

expression in embryonic stem cells. In other embodiments, the promoter is capable of 
providing expression in adult stem cells. It is contemplated that the adult stem cells are 
differentiated but not terminally differentiated; in other words, they are self-renewing and 
capable of being differentiated into other cell types derivative of the stem cell. For example, 

15 the adult stem cell may be a hematopoietic or epidermal stem cell meaning it is capable of 
self-renewing and becoming any hematopoietic or epidermal skin cell. The term "terminally 
differentiated" is .used according to its ordinary and plain meaning according to those of 
ordinary skill in the art. 

In other embodiments, the developmental decision promoter is capable of providing 

20 expression in adult stem cells that are in growing phase (i.e., in a non-resting phase of mitosis 
or meiosis). In certain other cases, the promoter is capable of providing expression in a cell 
from mouse embryonic development stages E3-E18.5. The E numbers refer to age of the 
embryo, based on days, from approximate conception, for example, as set forth on the World 
Wide Web at the following address: 

25 genex.hgu.mrc.ac.uk/CDROM_onl^^ It i s 

contemplated the expression may be achieved at mouse embryonic development stage El, 
E2, E3, E3.5, E4, E4.5, E5, E5.5, E6, E6.5, E7, E7.5, E8, E8.5, E9, E9.5, E10, E10.5, Ell, 
El 1.5, E12, E12.5, E13, E13.5, E14, E14.5, E15, E15.5, E16, E16.5, E17, E17.5, E18, E18.5, 
E19, E19.5, E20 or later, or any combination derivable therein, or any corresponding stage of 

30 development in another species of mammal. In some embodiments, the promoter can provide 
expression throughout all these stages of development (constitutive) or through a subset (not 
constitutive throughout). In particular embodiments, the promoter is capable of directing 
transcription in stem/progenitor cells of an embryo and also in adult stem cells periodically 
(non-constitutively) or constitutively. 
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It is further contemplated that the developmental decision promoter is further capable 
of providing expression in a cell that is in a developed animal. In other words, a developed 
animal refers to an animal that has already been born and is no longer an fetus or embryo. 
Thus, the cell may be in an animal that has already been born and it may be near or 
5 surrounded by differentiated cells or tissue. For example, the cell may be a stem or progenitor 
cell in the developed animal. 

In particular embodiments, the developmental decision promoter is an s-ship promoter 
region comprises a sequence that can hybridize under stringent conditions to nucleic acid 
segment comprising the complement of i) at least 20 contiguous nucleic acids of SEQ ID 
10 NO:l, SEQ ID NO:2, or SEQ ID NO:3; or ii) SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, 
SEQ ID NO:8, SEQ ID NO:9 and/or SEQ ID NO: 10. 

In some embodiments of the invention, an s-ship promoter is operably attached to a 
heterologous nucleic acid. The term "heterologous" is used according to its plain and 
ordinary meaning to a person skilled in the art of molecular biology. It is a relative term and 
15 in the context of an s-ship promoter, it refers to a nucleic acid sequence that is not normally 
found in nature (with respect to sequence and position) with the s-ship promoter. In other 
words, it refers to any nucleic acid that is not the entire genomic sequence of the s-ship gene. 
In some embodiments, the s-ship promoter is connecte.d : to a nucleic acid sequence encoding 
part of the s-ship gene product or all or part of an s-shipl cDNA sequence. Alternatively, the 
20 s-ship promoter may be placed at a location different than is found in nature. 

Because recombinant cells and transgenic animals, including knockout versions 
thereof, are part of the invention, the present invention further encompasses nucleic acids 
containing an s-ship gene or a portion thereof and a marker sequence, wherein the s-ship gene 
is disrupted by the marker sequence. In some embodiments, the nucleic acid is under the 
25 control of a promoter, which is an s-ship promoter in further embodiments. The promoter 
may also be constitutive, inducible, or conditional. Promoters discussed herein may be tissue- 
specific (spatially restricted), developmental-specific (providing transcription at specific 
developmental stages or times), and/or temporally restricted. 

The present invention further concerns expression cassettes, vectors, and host cells 
30 that contain or include polynucleotides having an s-ship promoter that has been isolated away 
from its chromosomal context. The polynucleotides and embodiments discussed above may 
be implemented with respect to expression cassettes, vectors, and host cells. 

It is contemplated that the s-ship promoter may control the transcription of a nucleic 
acid sequence encoding a marker. In some embodiments, the marker is colorimetric, 
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enzymatic, or fluorescent. Examples include, but are not limited to, P-galactosidase, 
chloramphenicol acetylase, luciferase, and green fluorescent protein. In further 
embodiments, a heterologous nucleic acid segment encodes a therapeutic or diagnostic gene 
product. The therapeutic or diagnostic gene product may be a protein or RNA molecule, such 
as an siRNA or miRNA molecule. In some embodiments, the therapeutic gene product is 
selected from the group consisting of a tumor suppressor, an oncogene, a cytokine, a cytokine 
receptor, a differentiation-inducer, growth factor, and a growth factor receptor. It is 
contemplated that more than one heterologous sequence or gene may be placed under the 
control of a promoter. Examples of such proteins are well known to those of skill in the art, 
and include, but are not limited to, interleukins (IL-2, -6, -8, -9, -10, -11, -12, -13, -14, -15 . 
16, -17, -18, -19, -20, -21, -22, -23, -24, etc.), interferons, receptor tyrosine kinases and their 
ligands (kit/steel, CSFR/CSF, GM-CSFR/GM-CSF, PDGFR/PDGF, flk-l/VEGF, Lif, EGF, 
FGF, etc.), transforming growth factors a and p, Epo, IGF, tumor necrosis factor a and p. A 
number of examples can be seen on the world wide web at 
indstate.edu/tlicme/mwking/growth-factors.html, which is specifically incorporated by 
reference. In specific embodiments, it is contemplated .that the heterologous encoded protein 
can transform or immortalize a cell, such as an oncogene.' In certain embodiments, a stem cell 
can be immortalized or transformed. •' • . • ' 

In some embodiments of the invention, a vector is a plasmid, YAC, BAC, or virus. 
Viruses include adenovirus, adeno-associated virus, retrovirus, flavivirus, and vaccinia virus. 

Compositions of the invention may be prepared in a pharmaceutically or 
pharmacologically acceptable formulation. Such formulations are well known to those of skill 
in the art for use in in vivo contexts. 

Other aspects of the invention include host cell having an s-ship promoter operably 
attached to a heterologous nucleic acid segment. In some embodiments, the host cell is 
eukaryotic, though it may be prokaryotic. In specific embodiments, the host cell is from a 
mammal, insect, bacteria, or yeast. Cells from monkeys, mice, rats, rabbits, hamsters, ferrets, 
and humans are specifically contemplated for use with nucleic acids of the invention. In some 
cases, the host cell is an embryonic cell, which may specifically be a blastocyst cell. In other 
cases, the host cell is a stem or progenitor cell. In some cases, the cell is a hematopoietic cell, 
meaning any cell in that lineage. It is contemplated that the cell may be in vitro or in vivo. 

Cells that can be used according to methods and compositions of the invention 
include, but are not limited to, CD34+ cells (cells expressing CD34 on their surface), 
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undifferentiated cells, stem cells, progenitor cells, cord blood cells, placental cells, neonatal 
or fetal cells, immature cells, pluripotent cells, and totipotent cells. The term "stem cell" is 
used according to its ordinary meaning, for example, as described by the National Institutes 
of Health (on the World Wide Web at stemceUs.nih.gov). Stem cells 1) are "capable of 
5 dividing and renewing themselves for long periods"; 2) are unspecialized; and, 3) can give 
rise to specialized cell types. 

The invention specifically contemplates the use of embryonic stem cells, adult stem 
cells, or neonatal and fetal stem cells. An adult stem cell typically refers to a stem cell from a 
particular organ or tissue that is capable of differentiating into one or more cells of that organ 
10 or tissue. Umbilical cord blood contains stem cells that are similar to embryonic stem cells in 
that they are believed to be capable of being differentiated into a number of different cell 
types, as opposed to cell types of one particular organ or tissue. Umbilical cord blood refers 
to blood that remains in the umbilical cord and placenta following birth and after the cord is 
cut. "Placental blood" is understood to be synonymous with cord blood; similarly, cord 
15 blood stem cell is considered synonymous with placental or placental blood stem cell. The 
use of stem cells from umbilical cord blood is specifically contemplated in certain 
... • embodiments of the invention. In some but not all cases, the use of other stem cells is 
specifically not considered part of the invention,,particularly the use of pancreatic/endocrine 
progenitor or stem cells is not considered for use with some embodiments. Furthermore, cells 
of the invention may be characterized by cell surface antigens. Cell surface antigens and their 
correlation with cell type and cell development are known to those of ordinary skill in the art. 

It will be understood that cultures or samples containing cells discussed above are 
also contemplated for use according to methods and compositions of the invention. 

Further embodiments of the invention include cells for use in the generation of 
25 transgenic organisms (knock-in and knock-out). Accordingly, there are recombinant host 
cells in which one or both s-ship genes is disrupted by marker sequence or in which all or part 
of an s-ship gene is flanked by an excisable sequence, such as a loxP sequence. The marker 
sequence serves the purpose of showing when the transgenic sequence is present or absent in 
the cell. 

The present invention further concerns transgenic animals comprising an s-ship 
promoter operably attached to a heterologous nucleic acid segment. Mammals are specifically 
contemplated, particularly mice. In some cases, the invention involves a mammal having 
cells comprising an s-ship transgenic sequence. The s-ship sequence may be knocked in or 
out in a restricted or controlled manner. For example, whether it is knocked in or out may be 
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controlled in a tissue-specific, inducible, conditional, developmental or temporal manner. 
Consequently, animals may have heterologous genes under the control of a promoter or 
system that operates in that way. The Cre-Lox system is one example. The transgene of 
interest itself may not be under the control of a limited promoter, but a secondary gene whose 
product initiates the knock-in or knock-out process may be under such a promoter. In one 
embodiments, animals of the invention may have an s-ship transgenic sequence that includes 
an s-ship coding sequence flanked by loxP sequences. They may also have a heterologous 
nucleic acid sequence encoding a Cre recombinase. In some cases, the nucleic acid sequence 
encoding the Cre recombinase is under the control of an inducible or conditional promoter. 
Transgenic animals of the invention are not limited by the Cre-Lox system, which serves as 
an example of how expression may be controlled. 

A number of methods are included as part of the present invention. In some 
embodiments, there are methods for expressing a recombinant nucleic acid in a cell 
comprising:a) transfecting the cell with an expression cassette comprising an s-ship promoter 
operably attached to the recombinant nucleic acid, wherein the nucleic acid is transcribed. 
The cell may be any of the host cells discussed above. Moreover, it is contemplated , that 
embodiments may be carried out with a developmental decision promoter, which may be an 
s-ship promoter region. In some embodiments there are methods for expressing a nucleic acid 
in a stem cell comprising providing to a cell a polynucleotide including the nucleic acidiunder 
the control of a developmental decision promoter, wherein the nucleic acid is expressed in the 
cell. It is contemplated that a cell may be in a subject. The cell may have been provided with 
a nucleic acid in vivo or in vitro. In the latter case, a cell may be introduced into a subject 
thereafter. 

Alternatively, an isolated nucleic acid encoding a developmental decision promoter 
can be provided to cell such that the promoter integrates into the cell's genome to drive 
expression of a gene that becomes operably linked to the promoter. The present invention 
covers methods and compositions for implementing the expression strategy. 

Other embodiments of the invention concern methods of screening for a candidate 
substance that regulates activity of the s-ship promoter comprising a step selected from the 
group consisting of: (a) contacting a nucleic acid comprising an s-ship promoter with an s- 
ship promoter binding protein and the candidate substance under conditions that allow 
binding between the protein and the promoter and determining whether the candidate 
compound modulates the binding between the protein and the promoter; and .(b) contacting 
the candidate substance with a cell comprising the s-ship promoter operably attached to a 
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reporter gene coding for an expression product and assaying for expression of the reporter 
gene expression product. One or both steps may be employed. Ways of determining whether 
the candidate compound modulates binding between a protein and the promoter are well 
known to those of skill in the art. The compound may inhibit, reduce, decrease, eliminate, 
5 increase, promote, tighten the binding between the protein and the promoter. Assays for such 
an interaction include, but are not limited to, electrophoretic mobility shift assays (EMSA), 
DNA footprinting, functional transcription assays — as described above — Southwestern 
assays, and PCR-based assays. 

The present invention also includes methods for identifying stem cells in a population 

10 of cells comprising: (a) administering to cells in the population a nucleic acid comprising an 
s-ship promoter operably attached to a reporter or marker gene. The reporter or marker gene 
is then used to identify positively-expressing cells, which would indicate the cell is a stem 
cell. The cell may be in an organ and/or in an animal. In some embodiments, methods include 
sorting cells based on expression of the reporter or marker gene. In addition to the assays 

15 discussed above, FACS analysis may be employed, in addition to other cell sorting 
techniques. Methods include differentiation of the cells. 

.■ Aspects of the invention also concern methods for screening for a modulator of cell- 
function comprising: a) transfecting a stem or hematopoietic cell with an expression cassette 
comprising an s-ship promoter operably attached to a nucleic acid encoding a candidate 

20 modulator; and, b) assaying the cell for a cell function, wherein a difference in cell function 
in the cell as compared to a cell in the absence of the candidate modulator is indicative of a 
modulator. The term "modulator" refers to a substance that affects cell function. It may affect 
cell function by acting on or through a pathway. The modulator may inhibit, reduce, 
eliminate, decrease, increase, promote, induce, or enhance a particular cell function or result 

25 of a pathway in the cell. It is contemplated that this method may be employed to identify a 
modulator as a candidate therapeutic agent for the treatment of a blood-related disease or 
condition. 

Therapeutic methods are also provided by the present invention. Methods are not 
necessarily limited to a particular disease or condition. It is contemplated that any method in 
30 which expression in stem cells or cells in which the s-ship promoter can function are 
contemplated for use in therapeutic methods of the invention. For example, the method may 
be applied to pancreatic disorders and diseases. 

Thus, in some embodiments, there is a method of treating a patient with a blood- 
related disease or condition comprising: a) transfecting a cell with an expression cassette 
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comprising an s-ship promoter region operably attached to a therapeutic nucleic acid; and, b) 
administering the cell to the patient. Blood-related disease or condition include blood-related 
cancers— such as leukemia, lymphoma, or myeloma— and anemia. In some cases, the blood- 
related condition can be treated using stem cell replacement therapy. 

Other methods of the invention include providing a method of treating a tumor 
comprising providing to stem cells of the tumor an agent that promotes their destruction. In 
some embodiments a patient with a tumor is provided with a host cell or expression construct 
comprising a developmental decision promoter such as an s-ship promoter region that 
provides expression for a therapeutic agent in the tumor stem cell. The therapeutic agent may 
a protein or a nucleic acid. In certain embodiments, the agent promotes apoptosis or cell 
death of the tumor stem cell, such as with a toxin or apoptosis inducer. 

Other methods of the invention include ways of tracking stem cells or isolating stem 
cells. Expression using developmental decision promoters can be used to track or isolate stem 
cells by virtue of their expressing a product that can be tracked or used to isolate the stem 
cells. In terms of tracking, the product may be some kind of reporter, which may or may not 
be a cell surface marker. In the case of isolating stem cells, the product will allow the stem 
cells to be separated from non-stem cells, such as by FACS analysis based on an expressed 
cell surface marker. 

Cells for therapeutic use may, in addition to the cells discussed above, be bone 
marrow cells, or be autologous or allogeneic. 

It is contemplated that any embodiment discussed with respect to any method or 
composition described herein can be implemented with respect to any other method or 
composition described herein. For example, an embodiments discussed with respect to an s- 
ship promoter region applies to a developmental decision promoter, and vice versa. Similarly, 
an embodiments discussed with a polynucleotide, primer, expression constructs, host cells, 
transgenic organisms, and method of the invention are contemplated for use with any other 
polynucleotides, primers, expression constructs, host cells, transgenic organisms, and 
methods of the invention, and vice versa. 

The use of the term "or" in the claims is used to mean "and/or" unless explicitly 
indicated to refer to alternatives only or the alternatives are mutually exclusive, although the 
disclosure supports a definition that refers to only alternatives and "and/or." 
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Throughout this application, the term "about" is used to indicate that a Value includes 
the standard deviation of error for the device or method heing employed to determine the 
value. 

Following long-standing patent law convention, the words "a" and "an," when used in 
5 conjunction with the word "comprising" in the claims or specification, denotes one or more, 
unless specifically noted. 

Other objects, features and advantages of the present invention will become apparent 
from the following detailed description. It should be understood, however, that the detailed 
description and the specific examples, while indicating specific embodiments of the 
10 invention, are given by way of illustration only, since various changes and modifications 
within the spirit and scope of the invention will become apparent to those skilled in the art 
from this detailed description. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 

The following drawings form part of the present specification and are included to 
further demonstrate certain aspects of the present invention. The invention may be better 
understood by reference to one or more of these drawings in combination with the detailed 
description of specific embodiments presented herein. 

20 FIG 1. ship] genomic segments cloned into a promoter-less expression vector for 

testing cell-specific promoter activity. The upper line represents the general shipl genomic 
region containing potential activity for cell-specific s-SHtP expression. Intron 5 contains the 
likely promoter activity and transcription is proposed to begin before exon 6. The 44-intronic 
nucleotides, contained in the s-SHIP cDNA, are shown as red. A 7.6 kb genomic fragment 

25 (second line down), as well as the indicated sub-fragments, were cloned into a promoter-less 
plasmid for GFP expression. The design and construction of the plasmid is detailed in 
Materials and Methods. 

FIG. 2. Flow cytometry analysis of cell type-specific promoter activity in D3 ES 
cells vs. NTH3T3 cells. Each construct shown was cloned into a promoter-less GFP plasmid, 
30 which was linearized and electroporated into D3 ES cells, or transfected into NTH3T3 cells. 
G418 resistant colonies were then examined by flow cytometry for GFP expression. Two 
different "empty vector" negative controls were utilized depending on whether the insert 
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contained a splice acceptor or both splice acceptor and donor. Both these plasmids without 
genomic insert were negative for GFP expression in both cell types, but only a single 
negative-control vector is shown. Two positive-control plasmids were utilized in each 
experiment. These controls expressed GFP from an IRES, and one expressed a protein insert, 
both were positive in each cell type. 

FIG. 3A-B. Structure of the 1 1.5-kb and 6.2-kb transgenic promoter-GFP constructs 
for in vivo analysis. FIG. 3A. Two promoter transgenic constructs were prepared. The first 
construct is called the 11.5kb-GFP transgene, and contains the entire genomic ship! segment 
from the Sac I site near the 5' end of intron 5 through the putative translation start site at an 
ATG preceded by a suitable Kozac sequence within exon 7. The translational start ATG for 
the enhanced GFP is fused, in frame, to the likely ATG translational start for s-SHIP. A 
second transgenic construct, called the 6.2kb-GFP transgene, is identical to the 11.5kb-GFP 
construct, except it contains only 0.96 kb upstream of exon 6, and lacks 833 nt within intron 
6. FIG. 3B. Transgenic copy numbers were estimated by semi-quantitative PCR analysis 
relative to the endogenous diploid gab2 gene. 

FIG. 4. Computer analysis of 600 nucleotides of the intron-5 transgene promoter 
region. A. The region immediately upstream of exon 6 is shown with potential transcription 
factor binding motifs determined by analysis using the Matlnspector program. Only the 
factors with matrix and core similarity greater than 0.9 are shown. Those factor motifs within 
the strand shown are over-lined, while those factors potentially interacting with the 
complementary strand are shown underlined. The SSR or stem-SHIP region identified by Tu 
et al, 2001 is in bold, and an initiator sequence for transcription is situated at the beginning 
of the SSR. Exon 6 (not shown) begins at the 3' end of the SSR. 

FIG. 5. The two primary proteins, s-SHIP and SHIP1, are produced from the shipl 
gene. The domain structure of the two proteins is shown above the ship] genomic intron/exon 
organization. Transcription for the full-length 145 kDa SHIP1 protein initiates in promoter 1 
(Prol), utilizes all 27 exons, and translation begins in the exon 1 encoded region. The stop 
codon is the first three nucleotides of exon 27. Transcription for the s-SHIP protein begins 
within intron 5 (Pro 2), and downstream is identical to the SHIP1 product. Translation, 
however, presumably begins in the first ATG of exon 7. Both transcripts and protein 
sequences are therefore identical from the ATG in exon 7 through the stop codon in exon 27. 
The dashed lines indicate translation start and stop points for each protein within the genomic 



exons. 
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FIG. 6. The 560-nucleotide regions immediately upstream of exon 6 from the mouse 
and human sequences were compared. Inr indicates the initiator sequence. Binding sites are 
also identified. 

FIG. 7. p53 binding sequences with half sites are depicted. This sequence is SEQ ID 

5 NO:7. 

DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

The present invention is based on the isolation and characterization of the s-shipl 
promoter, which can be used to promote transcription. Methods and compositions involving 
the s-shipl promoter are provided herein. In some embodiments, they take advantage of the 
10 tissue specificity of s-shipl expression, s-shipl encodes a protein whose expression has been 
observed in limited cell populations, and thus, the tissue-specificity of its promoter can be 
exploited in a number of different ways. 

I. SHIP1 and s-SHIP Background 

The s-shipl promoter was studied because of the function and expression patterns for 

15 the s-shipl (also referred to as s-ship) and shipl gene products. The murine SHTPl protein is 
encoded in 27 exons of the Inpp5d (inositol polyphosphate-5-phosphatase D) locus, spanning . 
approximately 102 kbps on chromosome 1 at position 57.0 cM of the genetic map, or 
cytoband C5 of the cytogenetic map (reviewed in Rohrschneider et al, 2000; Wolf et al, 
2001; NCBI databases). The full-length protein is 145 kDa, but splicing, involving exons 25 

20 and 26, can produce 4 additional proteins ranging in size from 109-135 kDa (Lucas and 
Rohrschneider, 1999; Wolf et al, 2000). These splicing reactions affect the 350-amino acid 
C-terminal tail region and its numerous protein interaction motifs, such as those binding PTB, 
SH2, and SH3 domains. 

The prominent structural features of the SHIP1 protein dictate its major functional 

25 interactions. The SHIP1 SH2 domain has general specificity for tyrosine-phosphorylated 
Yxx(L/I/V) amino acid motifs, and its inositol 5 'phosphatase domain removes phosphate 
from the 5' position of inositol(3,4,5)P 3 , phosphatidylinositol(3,4,5)P 3 or Inositol(l,3,4,5)P 4 
[see Sly et al, (2003) for review]. The tyrosine-phosphorylated C-terrninal tail interacts 
directly with the PTB domain of She and Dok proteins (Lioubin et al, 1996; Sattler et al, 

30 2001; Tamir et al, 2000), and a potential interaction motif for the SH2 domain of the p85 
component of the p85/PI3K is present in the full-length SHIP1 (Gupta et al, 1999; Lucas and 
Rohrschneider 1999), but eliminated by the splicing events producing the a and p isoforms 
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(Rohrschneider et al, 2000). Polyproline-rich interaction motifs for the SH3 domains of Grb2 
also are present in the C-tail region (Kavanaugh et al, 1996). The SH1P1 proteins {e.g., the 
145 kDa protein and isoforms thereof) are expressed in hematopoietic cells and testes, with 
lower expression observed in a few other adult tissues (Q. Liu et al, 1998, reviewed in 
Rohrschneider, 2003). 

Functionally, both biochemical and genetic experiments indicate SffiPl is a negative 
regulator of myeloid cell proliferation, survival, and perhaps chemotaxis (see Sly et al, 2003; 
Rohrschneider, 2003). Also, SHIP1 negatively regulates degranulation, inflammatory 
cytokine release, and adhesion for mast cells, and SHIP1 is a component of negative 
signaling (anergy) in B cell proliferation. The molecular mechanisms for most of these effects 
require the attachment of the SHIP1 SH2 domain to the cytoplasmic portions of 
transmembrane receptors containing appropriate tyrosine-phosphorylated interaction motifs. 
There, the SHIP1 inositol-5 'phosphatase domain converts the plasma membrane PI3K- 
produced substrate, phosphatidylinositol(3,4,5)P 3 to phosphatidylinositol(3,4)P 2 effectively 
terminating proliferation signals. Therefore, the SH2 domain of SHIP 1 plays a critical role in 
initiating many of these negative biological effects. 

An additional smaller protein from the shipl locus is described as an SH2-less 104 
kDa protein (Tu et al, 2001). Tins product is called s-SHIP, with the prefix signifying its 
only known expression within two stem cell types (i.e., ES cells and lineage-depleted Seal- 
positive cells of the bone marrow). This protein was first described by Kavanaugh et al 
(1996) and called SIP-110 in the human; but details of its existence were unclear until Tu et 
al. (2001) defined the cDNA and demonstrated endogenous expression in the two cell types 
described above. Thus, structurally, s-SHIP differs from SHIP1 only by the lack of the N- 
terminal SH2 domain; but biochemically, s-SHTP also lacks tyrosine phosphorylation and 
association with She (Kavanaugh et al, 1996; Tu et al, 2001). Nevertheless, s-SHIP 
constitutively interacts with Grb2. The lack of an SH2 domain in s-SHIP indicates its 
interaction mechanism with target proteins probably differs from that of SH1P1 ; however, the 
biological functions of s-SHIP are not known. 

II. Nucleic Acids 

A. Polynucleotides 

The s-shipl promoter was identified as a strong promoter for s-SHIP by analyses of 
the genomic shipl intron-5 region in driving GFP expression both in vitro and in vivo. This 
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promoter exhibited cell-type specific expression in ES cells, and mice transgenic for the 
promoter (the 11.5kb-GFP transgene) showed tissue-specific GFP expression within the inner 
cell mass of the blastocyst. Transgenic mice produced with a shorter promoter construct (the 
6.2kb-GFP transgene) expressed GFP throughout the blastocyst, suggesting the absence of 
5 negative regulatory regions in the shorted transgene. RT-PCR analysis demonstrated s-SHIP 
expression within the blastocyst. These results indicate that the 11.5-kb promoter region of 
the transgene contains the information for tissue-specific expression of s-SHIP, as well as 
tissue-specific shut-off of this protein. It is specifically contemplated that this promoter and 
the transgenic mice will be useful for future examination of GFP-expression in potential 
10 stem/progenitor cells of the embryo and the adult mouse. 

The present invention concerns polynucleotides, isolatable from cells, that are free 
from total genomic DNA and that contain an s-ship promoter. It is contemplated that the s- 
shipl promoter is capable of directing transcription of nucleic acid sequence. Transcription 
may be directed in a tissue-specific or developmental manner. The nucleic acid sequence may 

15 encode a peptide or polypeptide, or it may also encode an RNA molecule that is not 
translated into a protein. 

A "promoter" is a control sequence that is a region of a nucleic acid sequence at 
which initiation and rate of transcription are controlled. It may contain genetic elements at 
which regulatory proteins and molecules may bind, such as RNA polymerase and 

20 transcription factors. The phrases "operatively positioned," "operatively linked," "under 
control," "operatively attached," and "under transcriptional control" mean that a promoter is 
in a correct functional location and/or orientation in relation to a nucleic acid sequence to 
control transcriptional initiation and/or expression of that sequence. Typically, the promoter 
is located 5' or upstream from the strand of sequence to be transcribed. A promoter may or 

25 may not be used in conjunction with an "enhancer," which refers to a cis-acting regulatory 
nucleic acid sequence involved in the transcriptional activation of a nucleic acid sequence. 
More particularly, it refers to a nucleic acid sequence that is tissue-specific and stimulates 
transcription regardless of orientation (forward or reverse orientations both work). The 
inventors believe that within the first 600 nucleotides upstream of exon 6 there is enhancer 

30 activity. Consequently, it is contemplated that all or part of this region may be included in 
nucleic acid construct containing segment of SEQ ID NO:l . 

As used herein, the term "DNA segment" or "nucleic acid segment" refers to a DNA 
or nucleic acid molecule that has been isolated free of total genomic DNA of a particular 
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specie, Therefore, a DNA segment encoding a polypeptide refers to a segment that contains 
wild-type, polymorphic, or mutant polypeptide-coding sequences yet is isolated away from 
or punfied free from, total mammalian or human genomic DNA. Included within the term 
"DNA segment" are a polypeptide or polypeptides, DNA segments smaller than a 
» Polypeptde, and recombinant vectors, including, for example, plasmids, cosmids phage 
viruses, and the like. 

As used in this application, the term "s-ship polynucleotide" refers to an s-ship-encoding 

RNA (mRNA) as template. S 

It also is contemplated that a particular polypeptide from a given species may be 
represented by natural variants that have slightly different nucleic acid sequences but 
nonetheless, encode the same protein. 

Similarly, a polynucleotide comprising an isolated or purified wiM . type> pol Uc 
or mutim, pCy^ gene refers , Q a MA ^ ^ ^ ^ ^ 

mutan, polypeptide coding sequences isolated substimtially away from other naturally 
occumng genes or protein encoding sequences, to this respect, the tetm "gene" is used for 
stmplrctty to refer to a functional protein, polypeptide, or peptide-encoding unit. As will be 
undemtood by those fa the art, tins functional term includes genomic sequences cDNA 
sequences, and smaller engineered gene segments that express, or may be adapted to express 
protems, polypeptides, domains, peptides, fusion proteins, and mutants. A nucleic acid 
oncodmg all or par. of a native or modified polypeptide may contain a contiguous nucleic 
actd .sequence encoding all or a portion of such a po.ypep.ide of the Mowing lengths: about 

1 11 10 °' ' '°- m i3 °- I40 ' 150 ' 160 ' 18 °. 200, 210 

220, 230, 240, 250, 260, 270, 280, 290, 300, 3I 0, 320, 330, 340, 350, 360, 370, 380 390 

400, 410, 420, 430, 440, 441, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540 55»' 56o' 
570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 72 0 ' 73 0 ' 74 0 ' 
750, 760, 770, 780, 790, 800, 8,0, 820, 830, 840, 850, 860, 870, 880, 890, 900 910 920 
930, 940, 950, 960, 970, 980, 990, 1000, 10.0, ,020, ,030, ,040, ,050, ,060, ,070 ',08 0 ' 
1090, 1095, „00, ,500, 2000, 2500, 3O00, 3500, 4000, 4500, 5000, 5500, 6000, 650 0 ' 700 0 ' 
7500, 8000, 9000, ,0000, or more nucleotides, nucleosides, or basepairs. ' ' 

In particular embodiments, the invention concerns isolated DNA segments and 
recombman, vectors incotporating an s- sUp promoter with a heterologous nucleic acid 
sequence or a ship or s-ship cDNA segment. ^ . isohM DNA ^ or ^ 
contatnmg a DNA segment may encode, for example, the heterologous nucteic acid 
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sequence. The term "recombinant" may be used in conjunction with a polypeptide the name 
of a specific polypeptide, a nucleic acid sequence, or a host cell, and this generally means that 
the entity involves or involved a nucleic acid molecule that was manipulated in vitro using 
recombinant DNA technology. 

The nucleic acid segments used in the present invention, regardless of Ihe length of 
the coding sequence itself, may be combined with other nucleic acid sequences, such as 
enhancers, polyadenylation signals, additional restriction enzyme sites, multiple clonic sites 
other coding segments, and the like, such that their overall length may vary considerably It 
is therefore contemplated that a nucleic acid fragment of almost any length may be employed 
wxth the total length preferably being limited by the ease of preparation and use in the' 
intended recombinant DNA protocol. 

It is contemplated that the nucleic acid constructs of the present invention may encode 
full-length polypeptide from any source or encode a truncated version of the polypeptide 
such that the transcript of the coding region represents the truncated version. The truncated 
transcript may then be translated into a truncated protein. Alternatively, a nucleic acid 
sequence may encode a full-length polypeptide sequence with additional heterologous coding 
sequences, for example to allow for purification of the polypeptide, transport, secretion, post- 
translational modification, or for therapeutic benefits such as targetting or efficacy As 
discussed above, a tag or other heterologous polypeptide may be added to the modified 
polypeptide-encoding sequence, wherein "heterologous" refers to a polypeptide that is not the 
same as the modified polypeptide. 

In a non-limiting example, one or more nucleic acid constructs may be prepared that 
mclude a contiguous stretch of nucleotides of sequences disclosed herein, including the s - ship 
promoter. 

A nucleic acid construct may be at least 20, 30, 40, 50, 60, 70, 80, 90 100 110 120 
130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1 000 2 00 0 ' 
3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000, 20,000, 30,000 50 00 0 ' 
100,000, 250,000, 500,000, 750,000, to at least 1,000,000 nucleotides in length, as well as' 
constructs of greater size, up to and including chromosomal sizes (including all intermediate 
lengths and intermediate ranges), given the advent of nucleic acids constructs such as a yeast 
artificial chromosome are known to those of ordinary skill in the art. It will be readily 
understood that "intermediate lengths" and "intermediate ranges," as used herein, means any 
length or range including or between the quoted values (/.*., all integers including and 
between such values). 
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It is specifically contemplated that nucleic acids of the invention may include, be at 
most, or be at least 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,' 48,' 49,' 50,' 
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,' 
76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97,' 98,' 99,' 
100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270,' 
280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440,' 450,' 
460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610,' 620,' 630,' 
640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800,' 810,' 
820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980,' 990,' 
1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400,' 
2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700,' 3800,' 3900,' 
4000, 4100, 4200, 4300, 4400, 4500, 4600, 4700, 4800, 4900, 5000, 5100, 5200, 5300,' 5400,' 
5500, 5600, 5700, 5800, 5900, 6000, 6100, 6200, 6300, 6400, 6500, 6600, 6700, 6800,' 6900,' 
7000, 7100, 7200, 7300, 7400, 7500, 7600, 7700, 7800, 7900, 8000, 8100, 8200, 8300,' 8400,' 
8500, 8600, 8700, 8800, 8900, 9000, 9100, 9200, 9300, 9400, 9500, 9600, 9700, 9800, 9900,' 
10000, 10100, 10200, 10300, 10400, 10500, 10600, 10700, 10800, 10900, 11000, 11100,' 
11200, 11300, 11400, 11500, 11600, 11700, 11800, 11900, 12000 or more contiguous 
nucleotides (or any range derivable therein) of nucleic acid disclosed in this application, 
including, but not limited to SEQ ID NO:l, intron 5 of the mouse s-ship gene, an s-ship 
promoter, and any other SEQ ID NOs such as SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, 
SEQ ID NO:5, or any of SEQ ID NO:6-l 1 and/or 12-25. 

The various probes and primers designed around the nucleotide sequences of the present 
invention may be of any length. By assigning numeric values to a sequence, for example, the 
first residue is 1, the second residue is 2, etc., an algorithm defining all primers can be proposed: 

n to n + y 

where n is an integer from 1 to the last number of the sequence and y is the length of the primer 
minus one, where n + y does not exceed the last number of the sequence. Thus, for a 10-mer, 
the probes correspond to bases 1 to 10, 2 to 11, 3 to 12 ... and so on. For a 15-mer, the probes' 
correspond to bases 1 to 15, 2 to 16, 3 to 17 ... and so on. For a 20-mer, the probes correspond 
to bases 1 to 20, 2 to 21, 3 to 22 ... and so on. 

It also will be understood that this invention is not limited to the particular nucleic 
acid sequences of SEQ ID NO:l. Recombinant vectors and isolated DNA segments may 
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therefore variously include coding region£ . coding ^ ^ ^ 
modifications in the basic coding region, or they may encode biologically functional 
equivalent sequences. For example, mutations can be made to SEQ ID NOl-25 that 
potentially enhance or alter function relative to the native sequence or alternatively, may be 
silent with regard to function. 

The s-ship promoter sequence of the invention is exemplified by the nucleic acid 
sequence given in SEQ ID NO:l. Alternatively, an s-ship promoter sequence can include all 
or part of SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4 and/or SEQ ID NO:5, as well as any 
of SEQ ID NO:6-ll, or any sequence with at least 80% identity to such sequences and/or 
capable of hyrbidizing to the complements of such sequences under conditions of high 
stringency. The invention is not limited to SEQ ID NO:l as a person of ordinary skill in the 
art could readily manipulate SEQ ID NO:l or use all or part of it in subsequent assays and 
experiments. In certain embodiments, the present invention concerns nucleic acid sequences 
capable of hybridizing all or parts of SEQ ID NO:l. Parts of SEQ ID NO:l include in 
specific embodiments, a s-shipl promoter with one or more of the following regions of SEQ 
ID NO:l: from nucleotide (nt) 49485 to 60914 (11.5 kb-GFP construct); 49485 to 57072 
which is 7588 nt (7.6 kb construct); from nt 49485 to 55810, which is 6326 nt (6 3 kb 
construct); from 49485 to 54755, but lacking 57050 to 57883 (6.2kb-GFP construct)- from nt 
51389 to 55810, which is 4421 nt (4.4 kb construct); from nt 52199 to 56423, which is 4224 
nt (4.2 kb construct); from nt 53820 to 55810, which is 1990 nt (1.9 kb construct); from nt 
54755 to 55810, which is 1055 nt (0.96 kb construct); or from nt 55668 to 55810 which is 
142 nt (44 nt construct). It is specifically contemplated that nucleic acids of the invention 
include those capable of hybridizing to such regions or to subsets of such regions. Moreover 
it is contemplated that those nucleic acids capable of hybridizing to such regions may be at 
least 80, 85, 90, 95, 96, 97, 98, 9 9 o/ 0 or more complementary to all or part of these regions of 
SEQ ID NOs: 1-25. 

SEQ ID NO:l is one sequence for the shipl gene. The structure of the gene based on 
SEQ ID NO:l is as follows: exon 1 (1-300); axon 2 (4914-4977); exon 3 (44875-45025)- 
exon 4 (47380-47551); exon 5 (49130-49271); exon 6 (55771-55858); exon 7 (6907 7 ' 
61057); exon 8 (61175-61246); exon 9 (63231-63354); exon 10(71113-71219)- exon 11 
(74821-74923); exon 12 (76837-77033); exon 13 (77601-77718); exon 14 (78653-78749)- 
exon 15 (79268-79403); exon 16 (80787-80894); exon. 17 (81041-81129); exon 18 (82789' 
82870); exon 19 (85604-85693); exon 20 (87766-87879); exon 21 (89288-89370)- exon 22 
(90735-90822); exon 23 (91701-91850); exon 24 (92863-92959); exon 25 (94708-94983)- 
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exon 26 (97360-97953); and axon 27 (98991-100141). The respective introns lie between the 
exon sequences. In certain embodiments, the s-ship promoter comprises the region spanning 
intron 5 to intron 6 (inclusive) (referred to as "intron 5/6 region") or sequences from that 
region. This region is in SEQ ID NO:5. It will be understood that there may be minor 
sequence differences between different isolates and clones. In such cases, a person of skill the 
art would recognize corresponding regions between different isolates, clones, and strains. 

As used herein, "hybridization", "hybridizes" or "capable of hybridizing" is 
understood to mean the forming of a double or triple stranded molecule or a molecule with 
partial double or triple stranded nature. The term "anneal" as used herein is synonymous 
with "hybridize." The term "hybridization", "hybridize( s) " or "capable of hybridizing" 
encompasses the terms "stringent conditions)" or "high stringency" and the terms "low 
stringency" or "low stringency conditions)." 

As used herein "stringent conditions)" or "high stringency" are those conditions that 
allow hybridization between or within one or more nucleic acid strand(s) containing 
complementary sequence(s), but precludes hybridization of random sequences. Stringent 
conditions tolerate little, if any, mismatch between a nucleic acid and a target strand. Such 
conditions are well known to those of ordinary skill in the art, and are preferred for 
applications requiring high selectivity. Non-limiting applications include isolating a nucleic 
acid, such as a gene or a nucleic acid segment thereof, or detecting at least one specific mRNA 
20 transcript or a nucleic acid segment thereof, and me like. ■ 

Stringent conditions may comprise low salt and/or high temperature conditions, such as 
provided by about 0.02 M to about 0.5 M NaCl at temperatures of about 42°C to about 70°C It 
is understood that the temperature and ionic strength of a desired stringency are determined in 
part by the length of the particular nucleic acid(s), the length and nucleobase content of the 
target sequence(s), the charge composition of the nucleic acid(s), and to the presence or 
concentration of formamide, tetramethylammonium chloride or other solvent(s) in a 
hybridization mixture. It is generally appreciated that conditions can be rendered more 
stnngent by the addition of increasing amounts of formamide. For example, under highly 
stnngent conditions, hybridization to filter-bound DNA may be carried out in 0.5 M NaHP0 4 
7% sodium dodecyl sulfate (SDS), 1 mM EDTA at 65°C, and washing in 0.1 x SSC/O.l-j 
SDS at 68°C (Ausubel et at., 1989). 

Conditions may be rendered less stringent by increasing salt concentration and/or 
decreasing temperature. For example, a medium stringency condition could be provided by 
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about 0.1 to 0.25M NaCl at temperatures of about 37°C to about 55°C, while a low 
stringency condition could be provided by about 0.15M to about 0.9M salt, at temperatures 
ranging from about 20°C to about 55°C. Under low stringent conditions, such as moderately 
stringent conditions the washing may be carried out for example in 0.2 x SSC/0.1% SDS at 
42°C (Ausubel et al, 1989). Hybridization conditions can be readily manipulated depending 
on the desired results. 

It is also understood that these ranges, compositions and conditions for hybridization 
are mentioned by way of non-limiting examples only, and that the desired stringency for a 
parhcular hybridization reaction is often determined empirically by comparison to one or 
more posxfcve or negative controls. Depending on the application envisioned it is preferred to 
employ varying conditions of hybridization to achieve varying degrees of selectivity of a nucleic 
aad towards a target sequence, m a non-limiting example, identification or isolation of a 
related target nucleic acid that does not hybridize to a nucleic acid under stringent conditions 
may be achieved by hybridization at low temperature and/or high .onic strength. Such 
conditions are termed "low stringency" or "low stringency conditions", and non-limiting 
examples of low stringency include hybridization performed at about 0.15 M to about 0 9 M 
NaCl at a temperature range of about 20°C to about 50°C; Of course, it is within the skill of 
one in the art to further modify the low or high stringency conditions to suite a particular 
application. 

In other embodiments, hybridization may be achieved under conditions of for 
example, 50 mM Tris-HCl (pH 8.3), 75 mM KC1, 3 mM MgCl 2 , 1.0 mM ditbiothreitol, at 
temperatures between approximately 20°C to about 37°C. Other hybridization conditions 
utilized could include approximately 10 mM Tris-HCl (pH 8.3), 50 mM KC1, 1.5 mM 
MgCl 2 , at temperatures ranging from approximately 40°C to about 72°C. 

However, in addition to the unmodified s-ship promoter sequence of SEQ ED NO l 
the current invention includes derivatives of this sequence and compositions made therefrom' 
In particular, the present disclosure provides the teaching for one of skill in the art to make 
and use derivatives of the s-ship promoter. For example, the disclosure provides the teaching 
for one of skill in the art to delimit the functional elements within the s-ship promoter and to 
delete any non-essential elements. Functional elements also could be modified to increase 
the utihty of the sequences of the invention for any particular application. For example a 
functional region within the s-ship promoter of the invention could be modified to cause or 
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increase tissue-specific expression. Such changes could be made by site-specific mutagenesis 
techniques, for example, as described below. 

One efficient means for preparing such derivatives comprises introducing mutations 
into the sequences of the invention, for example, the sequence given in SEQ ID NO:l. Such 
mutants may potentially have enhanced or altered function relative to the native sequence or 
alternatively, may be silent with regard to function. It will be understood generally that any 
embodiment discussed in the application with respect to SEQ ID NO:l may be applied with 
respect to any other SEQ ID NO, and vice versa. 

Mutagenesis may be carried out at random and the mutagenized sequences screened 
for function in a trial-by-error procedure. Alternatively, particular sequences that provide the 
s-ship promoter with desirable expression characteristics could be identified and these or 
similar sequences introduced into other related or non-related sequences via mutation. 
Similarly, non-essential elements may be deleted without significantly altering the function of 
the elements. It farther is contemplated that one could mutagenic these sequences in order 
to enhance their utility in expressing transgenes in a particular cell type, for example, in a 
particular stem cell. 

The means for mutagenizing a DNA segment containing an s-ship promoter sequence 
of the current invention are well-known to those of skill in the art. Mutagenesis may be 
performed in accordance with any of the techniques known in the art, such as, but not limited 
to, synthesizing an oligonucleotide having one or more mutations within the sequence of a 
particular regulatory region. In particular, site-specific mutagenesis is a technique useful in 
the preparation of promoter mutants, through specific mutagenesis of the underlying DNA. 
The technique further provides a ready ability to prepare and test sequence variants, for 
example, incorporating one or more of the foregoing considerations, by introducing one or 
more nucleotide sequence changes into the DNA. Site-specific mutagenesis allows the 
production of mutants through the use of specific oligonucleotide sequences which encode 
the DNA sequence of the desired mutation, as well as a sufficient number of adjacent 
nucleotides, to provide a primer sequence of sufficient size and sequence complexity to form 
a stable duplex on both sides of the deletion junction being traversed. Typically, a primer of 
about 17 to about 75 nucleotides or more in length is preferred, with about 10 to about 25 or 
more residues on both sides of the junction of the sequence being altered. 

In general, the technique of site-specific mutagenesis is well known in the art, as 
exemplified by various publications. As will be appreciated, the technique typically employs 
a phage vector which exists in both a single stranded and double stranded form. Typical 
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vectors useful in site-directed mutagenesis include vectors such as the M13 phage. These 
phage are readily commercially available and their use is generally well known to those 
skilled in the art. Double stranded plasmids also are routinely employed in site directed 
mutagenesis which eliminates the step of transferring the gene of interest from a plasmid to a 
phage. 

Site-directed mutagenesis in accordance herewith typically is performed by first 
obtaining a single-stranded vector or melting apart of two strands of a double-stranded vector 
which includes within its sequence a DNA sequence that includes the s-ship promoter. An 
oligonucleotide primer bearing the desired mutated sequence is prepared, generally 
synthetically. This primer is then annealed with the single-stranded vector, and subjected to 
DNA polymerizing enzymes such as the E. coli polymerase I Klenow fragment, in order to 
complete the synthesis of the mutation-bearing strand. Thus, a heteroduplex is formed 
wherein one strand encodes the original non-mutated sequence and the second strand bears 
the desired mutation. This heteroduplex vector is then used to transform or transfect 
appropriate cells, such as E. coli cells, and cells are selected which include recombinant 
vectors bearing the mutated sequence arrangement. Vector DNA can then be isolated from 
these cells and used for plant transformation. A genetic selection scheme was devised by 
Kunkel et al. (1987) to enrich for clones incorporating mutagenic oligonucleotides. 
Alternatively, the use of PCR™ with commercially available thermostable enzymes such as 
Tag polymerase may be used to incorporate a mutagenic oligonucleotide primer into an 
amplified DNA fragment that can then be cloned into an appropriate cloning or expression 
vector. The PCR™-mediated mutagenesis procedures of Tomic et al. (1990) and Upender et 
al. (1995) provide two examples of such protocols. A PCR™ employing a thermostable 
hgase in addition to a thermostable polymerase also may be used to incorporate a 
phosphorylated mutagenic oligonucleotide into an amplified DNA fragment that may then be 
cloned into an appropriate cloning or expression vector. 

The preparation of sequence variants of the selected promoter DNA segments using 
site-directed mutagenesis is provided as a means of producing potentially useful promoter 
sequences and is not meant to be limiting as there are other ways in which sequence variants 
of DNA sequences may be obtained. For example, recombinant vectors encoding the desired 
promoter sequence may be treated with mutagenic agents, such as hydroxyzine, to obtain 
sequence variants. 
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As used herein, the term "ohgonucleotide-directed mutagenesis procedure" refers to 
template-dependent processes and vector-mediated propagation which result in an increase in 
the concentration of a specific nucleic acid molecule relative to its initial concentration, or in 
an increase in the concentration of a detectable signal, such as amplification. As used herein, 
the term "oligonucleotide directed mutagenesis procedure" also is intended to refer to a 
process that involves the template-dependent extension of a primer molecule. The term 
template-dependent process refers to nucleic acid synthesis of an RNA or a DNA molecule 
wherein the sequence of the newly synthesized strand of nucleic acid is dictated by the 
well-known rules of complementary base pairing (see, for example, Watson and Ramstad, 
1987). Typically, vector mediated methodologies involve the introduction of the nucleic acid 
fragment into a DNA or RNA vector, the clonal amplification of the vector, and the recovery 
of the amplified nucleic acid fragment. Examples of such methodologies are provided by 
U.S. Patent No. 4,237,224, specifically incorporated herein by reference in its entirety. A 
number of template dependent processes are available to amplify the target sequences of 
interest present in a sample, such methods being well known in the art and specifically 
disclosed herein below. 

One efficient, targeted means for preparing mutagenized promoters or enhancers 
relies upon the identification of putative regulatory elements within the target sequence. This 
can be initiated by comparison with, for example, promoter sequences known to be expressed 
in a similar manner. Sequences which are shared among elements with similar functions or 
expression patterns are likely candidates for the binding of transcription factors and are thus 
likely elements which confer expression patterns. Confirmation of these putative regulatory 
elements can be achieved by deletion analysis of each putative regulatory region followed by 
functional analysis of each deletion construct by assay of a reporter gene which is 
functionally attached to each construct. As such, once a starting promoter or intron sequence 
is provided, any of a number of different functional deletion mutants of the starting sequence 
could be readily prepared. 

As indicated above, deletion mutants of the s-ship promoter also could be randomly 
prepared and then assayed. With this strategy, a series of constructs are prepared, each 
containing a different portion of the clone (a subclone), and these constructs are then 
screened for activity. A suitable means for screening for activity is to attach a deleted 
promoter construct to a selectable or screenable marker, and to isolate only those cells 
expressing the marker protein. In this way, a number of different, deleted promoter 
constructs are identified which still retain the desired, or even enhanced, activity. The 
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smallest segment which is required for activity is thereby identified through comparison of 
the selected constructs. This segment may then be used for the construction of vectors for the 
expression of exogenous protein. 

1. Vectors 

5 Promoter sequences of the invention may be comprised in a vector. The term 

"vector" is used to refer to a carrier nucleic acid molecule into which a nucleic acid sequence 
can be inserted for introduction into a cell where it can be replicated. A nucleic acid 
sequence can be "exogenous," which means that it is foreign to the cell into which the vector 
is being introduced or that the sequence is homologous to a sequence in the cell but in a 

10 position within the host cell nucleic acid in which the sequence is ordinarily not found. 
Vectors include plasmids, cosmids, viruses (bacteriophage, animal viruses, and plant viruses), 
and artificial chromosomes {e.g., YACs). One of skill in the art would be well equipped to 
construct a vector through standard recombinant techniques, which are described in 
Sambrook et al, (1989) and Ausubel et al, 1996, both incorporated herein by reference. In 

15 addition to encoding a polypeptide, a vector may encode other polypeptide sequences such as 
a tag or targetting molecule. Useful vectors encoding such fusion proteins include pIN vectors 
(Iaouye et al, 1985), vectors encoding a stretch of histidines, and pGEX vectors, for use in 
generating glutathione S-transferase (GST) soluble fusion proteins for later purification and 
separation or cleavage. A targetting molecule is one that directs the modified polypeptide to a 

20 particular organ, tissue, cell, or other location in a subj ecf s body. 

The term "expression vector" refers to a vector containing a nucleic acid sequence 
coding for at least part of a gene product capable of being transcribed. In some cases, RNA 
molecules are then translated into a protein, polypeptide, or peptide. In other cases, these 
sequences are not translated, for example, in the production of antisense molecules, siRNA 

25 molecules or miRNA molecules. In addition to s-ship promoter regions, expression vectors 
can contain a variety of other "control sequences," which refer to nucleic acid sequences 
necessary for the transcription and possibly translation of an operably linked coding sequence 
in a particular host organism. In addition to control sequences that govern transcription and 
translation, vectors and expression vectors may contain nucleic acid sequences that serve 

30 other functions as well and are described infra. 

In certain embodiments of the invention, the expression vector comprises a virus or 
engineered vector derived from a viral genome. The ability of certain viruses to enter cells 
via receptor-mediated endocytosis, to integrate into host cell genome and express viral genes 
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stably and efficiently have made them attractive candidates for the transfer of foreign genes 
into mammalian cells (Ridgeway, 1988; Nicolas and Rubenstein, 1988; Baichwal and 
Sugden, 1986; Temin, 1986). The first viruses used as gene vectors were DNA viruses 
including the papovaviruses (simian virus 40, bovine papilloma virus, and polyoma) 
(Ridgeway, 1988; Baichwal and Sugden, 1986) and adenoviruses (Ridgeway, 1988; Baichwal 
and Sugden, 1986). These have a relatively low capacity for foreign DNA sequences and 
have a restricted host spectrum. Furthermore, their oncogenic potential and cytopathic effects 
in permissive cells raise safety concerns. They can accommodate only up to 8 kb of foreign 
genetic material but can be readily introduced in a variety of cell lines and laboratory animals 
(Nicolas and Rubenstein, 1988; Temin, 1986). 

The retroviruses are a group of single-stranded RNA viruses characterized by an 
ability to convert their RNA to double-stranded DNA in infected cells; they can also be used 
as vectors. Other viral vectors may be employed as expression constructs in the present 
invention. Vectors derived from viruses such as vaccinia virus (Ridgeway, 1988; Baichwal 
and Sugden, 1986; Coupar et al, 1988) adeno-associated virus (AAV) (Ridgeway, 1988; 
Baichwal and Sugden, 1986; Hermonat and Muzycska, 1984) and herpesviruses may be 
employed. They offer several attractive features for various mammalian cells (Friedmann, 
1989; Ridgeway, 1988; Baichwal and Sugden, 1986; Coupar et al, 1988; Horwich et al, 
1990). 

a. Promoters and Enhancers 

A promoter may be one naturally associated with a gene or sequence, as may be 
obtained by isolating the 5' non-coding sequences located upstream of the coding segment 
and/or exon. Such a promoter can be referred to as "endogenous." Similarly, an enhancer 
may be one naturally associated with a nucleic acid sequence, located either downstream or 
upstream of that sequence. Alternatively, certain advantages will be gained by positioning 
the coding nucleic acid segment under the control of a recombinant or heterologous promoter, 
which refers to a promoter that is not normally associated with a nucleic acid sequence in its 
natural environment. A recombinant or heterologous enhancer refers also to an enhancer not 
normally associated with a nucleic acid sequence in its natural environment. Such promoters 
or enhancers may include promoters or enhancers of other genes, and promoters or enhancers 
isolated from any other prokaryotic, viral, or eukaryotic cell, and promoters or enhancers not 
"naturally occurring," i.e., containing different elements of different transcriptional 
regulatory regions, and/or mutations that alter expression. In addition to producing nucleic 
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acid sequences of promoters and enhancers synthetically, sequences may be produced using 
recombinant cloning and/or nucleic acid amplification technology, including PCR™ in 
connection with the compositions disclosed herein (see U.S. Patent 4,683,202, U.S. Patent 
5,928,906, each incorporated herein by reference). Furthermore, it is contemplated the 
control sequences that direct transcription and/or expression of sequences within non-nuclear 
organelles such as mitochondria, chloroplasts, and the like, can be employed as well. 

Naturally, it may be important to employ a promoter and/or enhancer that effectively 
directs the expression of the DNA segment in the cell type, organelle, and organism chosen 
for expression. Those of skill in the art of molecular biology generally know the use of 
promoters, enhancers, and cell type combinations for protein expression, for example, see 
Sambrook et al. (1989), incorporated herein by reference. The promoters employed may be 
constitutive, tissue-specific, inducible, and/or useful under the appropriate conditions to 
direct high level expression of the introduced DNA segment, such as is advantageous in the 
large-scale production of recombinant proteins and/or peptides. The promoter may be 
heterologous or endogenous. 

In addition to the s-ship promoter, other elements/promoters may be employed, in the 
context of the present invention, to regulate the expression of a gene. Table 1 is a list of other 
promoters and enhancers that may be used in conjunction with the s-ship promoter of the 
invention; this list also identifies references that indicate how promoters can be evaluated. It 
is not intended to be exhaustive of all the possible elements involved in the promotion of 
expression but, merely, to be exemplary thereof. Table 2 provides examples of inducible 
elements, which are regions of a nucleic acid sequence that can be activated in response to a 
specific stimulus. 





TABLE 1 


Promoter and/or Enhancer 


Promoter/Enhancer 


References 


Immunoglobulin Heavy Chain 


Banerji et al, 1983; Gilles et al, 1983; Grosschedl 
et al, 1985; Atchinson et al, 1986, 1987; hnler et 
al, 1987; Weinberger et al, 1984; Kiledjian et al 
1988;Portone*a/.; 1990 


Immunoglobulin Light Chain 


Queen et al, 1983; Picard et al, 1984 


T-Cell Receptor 


Luna et al, 1987; Wmoto et al, 1989; Redondo et 
al; 1990 


HLADQ a and/or DQp 


Sullivan et al, 1987 
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TABLE 1 


Promoter and/or Enhancer 


Promoter/Enhancer 


pfprptirPQ 


P-Interferon 


Goodboum Pt nl 1Q8(v Fniita &i nl IQ07. 

Goodboum et al. , 1 988 


Interleukin-2 


Greene et al 1 9RQ 


Interleukin-2 Receptor 


Greene pt al 1 Q80- T in of nl 1QQO 


MHC Class II 5 


Ketch pt nl 1QR0 


MHC Class II HLA-DRa 


Sherman ft nl 1 Q8Q 


B-Actin 


TCawnmntn ^ /?/ 1 XTrr of nl • 1 O0O 

xvtiwiuiiuLu m at., lyoo, iNg ef u/., iysy 


rVTii^rlp fVpfltirif* TTinciQ^ (\RC V XT\ 
.iv.LU.ovv.it; CttllJ. JG JVJJ.ld.oC IIVJ.V>J\.) 


Jaynes a/., iy«8; Horhck et al, 1989; Johnson et 
al, 1989 


Prealbumin (Transthyretin) 


Costa al, 1988 


Elastase I 


Omitz e£a/., 1987 


Metallothionein (MTU) 


Kssmetal, 1987; Culotta^a/., 1989 


Collagenase 


Pinkert et al, 1987; Angel ef al, 1987 


Albumin 


Pinkert erf a/., 1987; Tranche et al, 1989, 1990 


a-Fetoprotein 


Godbout et al, 1988; Campere et al, 1989 


t-Globin 


pinHinP ct nl 1 Dors'? C+iK1q „7 i aaa 

jDuuuie a/., lyoi , jrerez-otaDie et aL 9 lyyo 


B-Globin 


Trudel e/a/., 1987 


c-fos 


Clohen e>t nl 1 0R7 


c-HA-ras 


j.iicoiii<xij ? x 70U, i^cbunainps cf iyo3 


Insulin 


Edlundera/., 1985 


Neural Cell Adhesion Molecule 
(NCAM) 


Hirshefa/., 1990 


ai-Antitrypain 


Latimer et al, 1990 


H2B (TH2B) Histone 


Hwang et al, 1990 


Mouse and/or Type I Collagen 


Ripeef a/., 1989 


Glucose-Regulated Proteins 
(GRP94 and GRP78) 


Change* al, 1989 


Rat Growth Hormone 


Larsenefa/., 1986 


Human Serum Amyloid A (SAA) 


Edbrooke et al, 1989 


Troponin I (TNI) 


Yutzey et al, 1989 
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TABLE 1 

Promoter and/or Enhancer 


rromoter/Enhancer 


References 


Platelet-Derived drcmrth VoMm- 
(PDGF) 


recn et at., 1989 


Duchenne Muscular Dystrophy 


KlzmutetaL, 1990 


SV40 


Banerji et al, 1981; Moreau et al, 1981; Sleigh et 
al, 1985; Firak et al, 1986; Herr et al, 1986; hnbra 
et al, 1986; Kadesch et al, 1986; Wang et al, 
1986; Ondek et al, 1987; Kuhl et al, 1987; 
Schafmer etal, 1988 


Polyoma 


Swartzendruber et al, 1975; Vasseur et al, 1980; 
Katinka et al, 1980, 1981; Tyndell et al, 1981; 
Dandolo et al, 1983; de Villiers et al, 1984; Hen et 
al, 1986; Satake et al. 1988: Camnhell et nl 1 QR« 


Retroviruses 


Kriegler et al, 1982, 1983; Levinson et al, 1982; 
Kriegler al, 1983, 1984a, b 1988" Bosze et nl 
1986; Miksicek a/., 1986; Celander al, 1987; 
Thiesen er a/., 1988; Celander et al, 1988; Choi et 
al, 1988; Reisman et al. 1989 


Papilloma Virus 


Campo <tf al, 1983; Lusky a/., 1983; Spandidos 
and Wilkie, 1983; Spalholz et al 1985- Luskv pt 
al, 1986; Cripe et al, 1987; Gloss et al, 1987; 
Hirochika et al, 1987; Stephens er al, 1987 


Hepatitis B Virus 


Bulla er a/., 1986; Jameel et al, 1986- Shaul et al 
1987; Spandau era/., 1988; Vannice era/., 1988 


Human Immunodeficiency Virus 


Muesing er a/., 1987- Hauber nl iqrs- 
Jakobovits et al, 1988; Feng era/., 1988; Takebe et 
al, 1988; Rosen et al, 1988; Berkhout et al, 1989; 
Laspia er al, 1989; Sharp er a/., 1989; Braddock et 
al, 1989 


Cytomegalovirus (CMV) 


Weber et al, 1984; Boshart et al, 1985; Foecking 
era/., 1986 


Gibbon Ape Leukemia Virus 


Holbrook et al, 1987; Quinn et al, 1989 



31 



WO 2005/090559 



PCT/US2005/008977 



TABLE 2 

Inducible Elements 


Element 


Inducer 


References 


MTU 


Phorbol Ester (TFA) 
Heavy metals 


Palmiter et al, 1982; 
Haslinger et al, 1985; 
Searle et al, 1985; Stuart et 
al, 1985; hnagawa et al 
1987, Karin et al, 1987; 
Angel et al, 1987b; 
McNeallerfa/., 1989 


MMTV (mouse mammary 
tumor virus) 


Glucocorticoids 


Huang et al, 1981; Lee et 
al, 1981; Majors et al, 
1983; Chandler et al, 1983; 
Lee et al, 1984; Ponta et 
al, 1985; Sakaie/a/., 1988 


P-Interferon 


poly(rI)x 
nolWrc^ 


Taverniere^a/., 1983 


Adenovirus 5 E2 


E1A 


TmT^PflQTf* of nl 1 OP/1 


Collagenase 


Phorbol Ester (TP A) 


-tt-iigei a'., iyo /a 


Stromelysin 


Phorbol Ester (TPA) 


Aneel 1 QR7h 


SV40 


Phorbol Ester (TP A) 


Angelic/., 1987b 


Murine MX Gene 


Interferon, Newcastle 
Disease Virus 


Hu? pf nJ 1 OSR 


(TRP78 iTmo 
v ti\l I o yjcXIG 


A23187 


Resendeze^a/., 1988 


a-2-Macroglobulin 


TT /" 

IL-6 


Kunze^a/., 1989 


Vimentin 


Serum 


Rutlingefa/., 1989 


MHC Class I Gene H-2icb 


Interferon 


Blanar etal, 1989 


HSP70 


E1A, SV40 Large T 
Antigen 


Taylor etf a/., 1989, 1990a, 
1990b 


Proliferin 


Phorbol Ester-TPA 


Mordacq et al, 1989 


Tumor Necrosis Factor 


PMA 


Hensel etal, 1989 


Thyroid Stimulating 
Hormone a Gene 


Thyroid Hormone 


Chatterjeee^/., 1989 



The identity of tissue-specific promoters or elements, as well as assays to characterize 
activity, is well known to those of skill in the art. Examples of such regions include the 
m LEVIK2 gene (Nomoto et al 1999), the somatostatin receptor 2 gene (Kraus et al, 
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1998), murine epididymal retinoic acid-binding gene (Lareyre et al, 1999), human CD4 
(Zhao-Emonet et al, 1998), mouse alpha2 (XT) collagen (Tsumaki, et al, 1998), D1A 
dopamine receptor gene (Lee, et al, 1997), insulin-like growth factor II (Wu et al, 1997), 
human platelet endothelial cell adhesion molecule- 1 (Almendro et al, 1996), and the SM22a 
5 promoter. 

b. Initiation Signals and Internal Ribosome Binding Sites 

A specific initiation signal also may be required for efficient translation of coding 
sequences. These signals include the ATG initiation codon or adjacent sequences. 
Exogenous translational control signals, including the ATG initiation codon, may need to be 

10 provided. One of ordinary skill in the art would readily be capable of determining this and 
providing the necessary signals. It is well known that the initiation codon must be "in-frame" 
with the reading frame of the desired coding sequence to ensure translation of the entire 
insert. The exogenous translational control signals and initiation codons can be either natural 
or synthetic. The efficiency of expression may be enhanced by the inclusion of appropriate 

15 transcription enhancer elements. 

In certain embodiments of the invention, the use of internal ribosome entry sites 
(IRES) elements are used to create multigene, or polycistronic, messages. IRES elements are 
able to bypass the ribosome scanning model of 5'- methylated Cap dependent translation and 
begin translation at internal sites (Pelletier and Sonenberg, 1988). IRES elements from two 

20 members of the picornavirus family (polio and encephalomyocarditis) have been described 
(Pelletier and Sonenberg, 1988), as well an IRES from a mammalian message (Macejak and 
Sarnow, 1991). IRES elements can be linked to heterologous open reading frames. Multiple 
open reading frames can be transcribed together, each separated by an IRES, creating 
polycistronic messages. By virtue of the IRES element, each open reading frame is 

25 accessible to ribosomes for efficient translation. Multiple genes can be efficiently expressed 
using a single promoter/enhancer to transcribe a single message (see U.S. Patent 5,925,565 
and 5,935,819, herein incorporated by reference). 

c. Multiple Cloning Sites 

Vectors can include a multiple cloning site (MCS), which is a nucleic acid region that 
30 contains multiple restriction enzyme sites, any of which can be used in conjunction with 
standard recombinant technology to digest the vector. (See Carbonelli et al, 1999, Levenson 
et al, 1998, and Cocea, 1997, incorporated herein by reference.) "Restriction enzyme 
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digestion" refers to catalytic cleavage of a nucleic acid molecule with an enzyme that 
functions only at specific locations in a nucleic acid molecule. Many of these restriction 
enzymes are commercially available. Use of such enzymes is widely understood by those of 
skill in the art. Frequently, a vector is linearized or fragmented using a restriction enzyme 
that cuts within the MCS to enable exogenous sequences to be ligated to the vector. 
"Ligation" refers to the process of forming phosphodiester bonds between two nucleic acid 
fragments, which may or may not be contiguous with each other. Techniques involving 
restriction enzymes and ligation reactions are well known to those of skill in the art of 
recombinant technology. 

d. Splicing Sites 

Most transcribed eukaryotic RNA molecules will undergo RNA splicing to remove 
introns from the primary transcripts. Vectors containing genomic eukaryotic sequences may 
require donor and/or acceptor splicing sites to ensure proper processing of the transcript for 
protein expression. (See Chandler et al, 1 997, incorporated herein by reference.) 

e. Termination Signals 

The vectors or constructs of the present invention will generally comprise at least one 
termination signal. A "termination signal" or "terminator" is comprised of the DNA 
sequences involved in specific termination of an RNA transcript by an RNA polymerase. 
Thus, in certain embodiments a termination signal that ends the production of an RNA 
transcript is contemplated. A terminator may be necessary in vivo to achieve desirable 
message levels. 

In eukaryotic systems, the terminator region may also comprise specific DNA 
sequences that permit site-specific cleavage of the new transcript so as to expose a 
polyadenylation site. This signals a specialized endogenous polymerase to add a stretch of 
about 200 A residues (polyA) to the 3' end of the transcript. RNA molecules modified with 
this polyA tail appear to more stable and are translated more efficiently. Thus, in other 
embodiments involving eukaryotes, it is preferred that that terminator comprises a signal for 
the cleavage of the RNA, and it is more preferred that the terminator signal promotes 
polyadenylation of the message. The terminator and/or polyadenylation site elements can 
serve to enhance message levels and/or to minimize read through from the cassette into other 
sequences. 

Terminators contemplated for use in the invention include any known terminator of 
transcription described herein or known to one of ordinary skill in the art, including but not 
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limited to, for example, the termination sequences of genes, such as for example the bovine 
growth hormone terminator or viral termination sequences, such as for example the SV40 
terminator. In certain embodiments, the termination signal may be a lack of transcribable or 
translatable sequence, such as due to a sequence truncation. 

5 f. Polyadenylation Signals 

In expression, particularly eukaryotic expression, one will typically include a 
polyadenylation signal to effect proper polyadenylation of the transcript. The nature of the 
polyadenylation signal is not believed to be crucial to the successful practice of the invention, 
and/or any such sequence may be employed. Preferred embodiments include the SV40 
10 polyadenylation signal and/or the bovine growth hormone polyadenylation signal, convenient 
and/or known to function well in various target cells. Polyadenylation may increase the 
stability of the transcript or may facilitate cytoplasmic transport. 

g. Origins of Replication 

In order to propagate a vector in a host cell, it may contain one or more origins of 
15 replication sites (often termed "ori"), which is a specific nucleic acid sequence at which 
replication is initiated. Alternatively an autonomously replicating sequence (ARS) can be 
employed if the host cell is yeast. 

h. Selectable and Screenable Markers 

In certain embodiments of the invention, cells containing a nucleic acid construct of 
20 the present invention may be identified in vitro or in vivo by including a marker in the 
expression vector. Such markers would confer an identifiable change to the cell permitting 
easy identification of cells containing the expression vector. Generally, a selectable marker is 
one that confers a property that allows for selection. A positive selectable marker is one in 
which the presence of the marker allows for its selection, while a negative selectable marker 
25 is one in which its presence prevents its selection. An example of a positive selectable 
marker is a drug resistance marker. 

Usually the inclusion of a drug selection marker aids in the cloning and identification 
of transformants, for example, genes that confer resistance to neomycin, puromycin, 
hygromycin, DHFR, GPT, zeocin and histidinol are useful selectable markers. 
30 In addition to markers conferring a phenotype that allows for the discrimination of 

transformants based on the implementation of conditions, other types of markers including 
screenable markers such as GFP, whose basis is colorimetric analysis, are also contemplated. 



35 



10 



WO 2005/090559 PCT/US2005/008977 
Alternatively, screenable enzymes such as herpes simplex virus thymidine kinase (tk), 
chloramphenicol acetyltransferase (CAT), or luciferase may be utilized. One of skill in the 
art would also know how to employ immunologic markers, possibly in conjunction with 
FACS analysis. The marker used is not believed to be important, so long as it is capable of 
being expressed simultaneously with the nucleic acid encoding a gene product. Further 
examples of selectable and screenable markers are well known to one of skill in the art. 
2. Heterologous Sequences 
It is contemplated that polynucleotides of the invention include an sship promoter 
region controlling the expression of a heterologous nucleic acid sequence. The sequence may 
be a gene, cDNA sequence or an untranslated sequence, such as an siRNA. The invention is 
not limited to any specific sequence, but in certain embodiments, the heterologous sequence 
encodes any of the following proteins or RNAs. 

Table 3 below provides different classes of proteins, and in some cases, examples of 
tiiose proteins. 



15 



TABLE 3 



Protein Genus 


Protein 
Subgenus 


Protein Species 


Protein Subspecies 


1) Toxins 


Ribosorne 
Inhibitory 
Proteins 










Gelonin 








Ricin A Chain 








Pseudomonas 
Exotoxin 








Diptheria Toxin 








Mitogillin 








Saporin 




2)Cytokines/Growth 
Factors 


Interleukins 


IL-1, IL-2, IL-3, IL- 
4,IL-5,IL-6,IL-7, 
IL-8, IL-9, IL-10, IL- 
11,IL-12,IL-13,IL- 
14,IL-15,IL-16,IL- 
17,IL-18, IL-19 






TNF 








LT 








Interferons 


HTSfa, IFN/J, IFN-y 






Colony 

Stimulating 

Factors 


GM-CSF, G-CSF, M- 
CSF, CSF 






LIF 
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Fibroblast 
Growth Factors 


bFGF, FGF, FGF-1, 
FGF-2, FGF-3, FGF- 
4,FGF-8,FGF-9, 
FGF-10, FGF-18, 
FGF-20, FGF, 23 






VEGF 






3) Enzymes 


Oxidoreductases 








Transferases 


Transferring one- 
carbon groups 


Methyltransferases 








Carboxyl and 
carbamoyltransferases 








Amidinotransferases 






Transferring 
aldehyde or ketone 
residues 








Acyltransferases 


Acyltransferases 








Aminoacyltransferases 






Glycosyltransferases 


Hexosyltransferases 






Transferring alkyl or 
aryl groups, other 
than methyl groups 








Transferring 
nitrogenous groups 


Transaminases 








Oximinotransferases 






Transferring 
phosphorous- 
containing groups 


Phosphotransferases 








Diphosphotransferases 








Nucleotidyltransferase 
s 






Transferring sulfur- 
containing groups 


Sulfur-transferases 








Sulfotransferases 








CoA-transferases 






Transferring 

selenium-containing 

groups 






Hydrolases 


Acting on ester bonds 








Glycosylases 








Acting on ether 
bonds 








Acting on peptide 
bonds (peptide 
hydrolases) 








Acting on carbon- 
nitrogen bonds, other 
than peptide bonds 








Acting on acid 
anhydrides 
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Acting on carbon- 

1 -i i 

carbon bonds. 








Acting on halide 
bonds 








Acting on 

phosphorus-nitrogen 
bonds. 








Acting on sulfur- 
nitrogen bonds 








Acting on carbon- 
phosphorus bonds 








Acting on sulfur- 
sulfur bonds 






Lyases 


Carbon-carbon 
lyases. 








Carbon-oxygen 
lyases 








Carbon-nitrogen 
lyases 








Carbon-sulfur lyases 








Carbon-halide lyases 








Phosphorus-oxygen 
lyases 




- 


Isomerases 


Racemases and 
epimerases 








Cis-trans-isomerases 








Intramolecular 
oxidoreductases 








Intromolecular 

transferases 

(rautases) 








Phosphotransferases 
(phosphomutases) 






T * 

Ligases 


Forming carbon- 
oxygen bonds 








Forming carbon- 
sulfur bonds 








Forminu r* arhon- 

J- vfj.JLLIJ_lj.ii \saLU\Jxl 

nitrogen bonds. 








Forming carbon- 
carbon bonds 








Forming phosphoric 
ester bonds 





Other examples include but are not limited to the following: 
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a. Cytokines 

Another class of compounds that is contemplated to be operatively linked to a 
therapeutic polypeptide, such as a toxin, includes interleukins and cytokines, such as 
interleukin 1 (IL-1), DL-2, IL-3, IL-4, IL-5, JL-6, IL-7, IL-8, IL-9, IL-10, IL-11, IL-12, IL-13, 
IL-14, IL-15, (3-interferon, a-interferon, y-interferon, angiostatin, thrombospondin, 
endostatin, METH-1, METH-2, Flk2/Flt3 Hgand, GM-CSF, G-CSF, M-CSF, and tumor 
necrosis factor (TNF). 

c. Growth Factors 

In other embodiments of the present invention, growth factors or ligands can be 
complexed with the therapeutic agent. Examples include VEGF/VPF, FGF, TGF(3, ligands 
that bind to a TIE, tumor-associated fibronectin isoforms, scatter factor, hepatocyte growth 
factor, fibroblast growth factor, platelet factor (PF4), PDGF, KIT ligand (KL), colony 
stimulating factors (CSFs), LIF, and TIMP. 

d. Inducers of Cellular Proliferation 

Another group of proteins that may be used in conjunction with modified proteins of 
the present invention, such as modified gelonin toxin, comprises proteins that induce cellular 
proliferation. In some embodiments, the toxin is operatively linked to a ribozyme that can 
inactivate an inducer of cellular proliferation, while- in others, the toxin is linked to the 
inducer itself. Alternatively, a toxin may be attached to an antibody that recognizes an 
inducer of cell proliferation. 

The commonality of all of these proteins is their ability to regulate cellular 
proliferation. For example, a form of PDGF, the sis oncogene, is a secreted growth factor. 
Oncogenes rarely arise from genes encoding growth factors, and at the present, sis is the only 
known naturally-occurring oncogenic growth factor. In one embodiment of the present 
invention, it is contemplated that anti-sense mRNA directed to a particular inducer of cellular 
proliferation is used to prevent expression of the inducer of cellular proliferation. 

The proteins FMS, ErbA, ErbB and neu are growth factor receptors. Mutations to 
these receptors result in loss of regulatable function. For example, a point mutation affecting 
the transmembrane domain of the Neu receptor protein results in the neu oncogene. The 
erbA oncogene is derived from the intracellular receptor for thyroid hormone. The modified 
oncogenic ErbA receptor is believed to compete with the endogenous thyroid hormone 
receptor, causing uncontrolled growth. 
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The largest class of oncogenes includes the signal transducing proteins (e.g., Src, Abl 
and Ras). The protein Src is a cytoplasmic protein-tyrosine kinase, and its transformation 
from proto-oncogene to oncogene in some cases, results via mutations at tyrosine residue 
527. M contrast, transformation of GTPase protein ras from proto-oncogene to oncogene, in 
5 one example, results from a valine to glycine mutation at amino acid 12 in the sequence, 
reducing ras GTPase activity. 

The proteins Jun, Fos and Myc are proteins that directly exert their effects on nuclear 
functions as transcription factors. 

e. Inhibitors of Cellular Proliferation 
10 The tumor suppressors function to inhibit excessive cellular proliferation. The 

inactivation of these genes destroys their inhibitory activity, resulting in unregulated 
proliferation. It is contemplated that toxins may be attached to antibodies that recognize 
mutant tumor suppressors or wild-type tumor suppressors. Alternatively, a toxin may be 
linked to all or part of the tumor suppressor. The tumor suppressors p53, pl6 and C-CAM 
15 are described below. 

High levels of mutant p53 have been found in many cells transformed by chemical 
carcinogenesis, ultraviolet radiation, and several viruses. The p53 gene is a frequent target of 
mutational inactivation in a wide variety of human tumors and is already documented to be 
the most frequently mutated gene in common human cancers. It is mutated in over 50% of 
20 humanNSCLC(Hollsteine?a/., 1991) and in a wide spectrum of other tumors. 

The p53 gene encodes a 393-amino acid phosphoprotein that can form complexes 
with host proteins such as large-T antigen and E1B. The protein is found in normal tissues 
and cells, but at concentrations which are minute by comparison with transformed cells or 
tumor tissue 

25 Wild-type p53 is recognized as an important growth regulator in many cell types. 

Missense mutations are common for the p53 gene and are essential for the transforming 
ability of the oncogene. A single genetic change prompted by point mutations can create 
carcinogenic p53. Unlike other oncogenes, however, p53 point mutations are known to occur 
in at least 30 distinct codons, often creating dominant alleles that produce shifts in cell 

30 phenotype without a reduction to homozygosity. Additionally, many of these dominant 
negative alleles appear to be tolerated in the organism and passed on in the germ line. 
Various mutant alleles appear to range from minimally dysfunctional to strongly penetrant, 
dominant negative alleles (Weinberg, 1991). 
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Another inhibitor of cellular proliferation is pi 6. The major transitions of the 
eukaryotic cell cycle are triggered by cyclin-dependent kinases, or CDK's. One CDK, 
cyclin-dependent kinase 4 (CDK4), regulates progression through the Gi- The activity of this 
enzyme may be to phosphorylate Rb at late Gi. The activity of CDK4 is controlled by an 
5 activating subunit, D-type cyclin, and by an inhibitory subunit, the p^^ 4 has been 
biochemically characterized as a protein that specifically binds to and inhibits CDK4, and 
thus may regulate Rb phosphorylation (Serrano et al, 1993; Serrano et al, 1995). Since the 
pjgiNK4 p rote j n j s a CDK4 inhibitor (Serrano, 1993), deletion of this gene may increase the 
activity of CDK4, resulting in hyperphosphorylation of the Rb protein. pl6 also is known to 

10 regulate the function of CDK6. 

pl6 mK4 belongs to a newly described class of CDK~inhibitory proteins that also 
includes pl6 B , pl9, p21 WAFI , and p27 iapI . The v \6™™ gene maps to 9p21, a chromosome 
region frequently deleted in many tumor types. Homozygous deletions and mutations of the 
pjgiNK4 g ene s[e g- e q Uen t j n Airman tumor cell lines. This evidence suggests that the pie^ 4 

15 gene is a tumor suppressor gene. This interpretation has been challenged, however, by the 
observation that the frequency of the pl6 mKA gene alterations is much lower in primary 
uncultured tumors than in cultured cell lines (Caldas et al, 1994; Cheng et al, 199 4; 
Hussussian et al, 1994; Kamb et al, 1994; Kamb et al, 1994; Mori et al, 1994; Okamoto et 
al, 1994; Nobori et al, 1995; Orlow et al, 1994; Arap et al, 1995). Restoration of wild-type 

20 pW 006 * function by transfection with a plasmid expression vector reduced colony formation 
by some human cancer cell lines (Okamoto, 1994; Arap, 1995). 

Other genes that may be employed according to the present invention include Rb, 
APC, mda-7, fus-1, FHIT, pl6, DCC, NF-1, NF-2, WT-1, MEN-I, MBN-II, zacl, p73, VHL, 
MMAC1 / PTEN, DBCCR-1, FCC, rsk-3, p27, p27/pl6 fusions, p21/p27 fusions, anti- 

25 thrombotic genes {e.g., COX-1, TFPI), PGS, Dp, E2F, ras, myc, neu, raf, erb, fins, trk, ret, 
gsp, hst, abl, E1A, p300, genes involved in angiogenesis {e.g., VEGF, FGF, thrombospondin, 
BAI-1, GDAIF, or their receptors) and MCC. 

Other examples are provided in Table 4 below. 

f. Regulators of Programmed Cell Death 

30 Apoptosis, or programmed cell death, is an essential process for normal embryonic 

development, maintaining homeostasis in adult tissues, and suppressing carcinogenesis (Kerr 
et al, 1972). The Bcl-2 family of proteins and ICE-like proteases have been demonstrated to 
be important regulators and effectors of apoptosis in other systems. The Bcl-2 protein, 
discovered in association with follicular lymphoma, plays a prominent role in controlling 
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apoptosis and enhancing cell survival in response to diverse apoptotic stimuli (Bakhshi et al. 
1985; Cleary and Sklar, 1985; Cleary et al, 1986; Tsujimoto et al, 1985; Tsujimoto and 
Croce, 1986). The evolutionarily conserved Bcl-2 protein now is recognized to be a member 
of a family of related proteins, which can be categorized as death agonists or death 
antagonists. 

Apo2 ligand (Apo2L, also called TRAIL) is a member of the tumor necrosis factor 
(TNF) cytokine family. TRAIL activates rapid apoptosis in many types of cancer cells, yet is 
not toxic to normal cells. TRAIL mRNA occurs in a wide variety of tissues. Most normal 
cells appear to be resistant to TRAIL'S cytotoxic action, suggesting the existence of 
mechanisms that can protect against apoptosis induction by TRAIL. The first receptor 
described for TRAIL, called death receptor 4 (DR4), contains a cytoplasmic "death domain"; 
DR4 transmits the apoptosis signal carried by TRAIL. Additional receptors have been 
identified that bind to TRAIL. One receptor, called DR5, contains a cytoplasmic death 
domain and signals apoptosis much like DR4. The DR4 and DR5 mRNAs are expressed in 
many normal tissues and tumor cell lines. Recently, decoy receptors such as DcRl and DcR2 
have been identified that prevent TRAIL from inducing apoptosis through DR4 and DR5. 
These decoy receptors thus represent a novel mechanism for regulating sensitivity to a pro- 
apoptotic cytokine directly at the cell's surface. The preferential expression of these inhibitory 
receptors in normal tissues suggests that TRAIL may be useful as an anticancer agent that 
induces apoptosis in cancer cells while sparing normal cells. (Marsters et al. 1999). 

Subsequent to its discovery, it was shown that Bcl-2 acts to suppress cell death 
triggered by a variety of stimuli. Also, it now is apparent that there is a family of Bcl-2 cell 
death regulatory proteins which share in common structural and sequence homologies. These 
different family members have been shown to either possess similar functions to Bcl-2 {e.g., 
BcIxl, Bcl w , Bcl s , Mcl-1, Al, Bfl-1) or counteract Bcl-2 function and promote cell death 
(e.g., Bax, Bak, Bik, Bim, Bid, Bad, Harakiri). It is contemplated that any of these 
polypeptides, including TRAIL, or any other polypeptides that induce or promote of 
apoptosis, may be operatively linked to a toxin, or that an antibody recognizing any of these 
polypeptides may also be attached to a toxin. 

Granzyme enzymes are also capable of inducing apoptosis. These include Granzyme 
A and Granzyme B. 

Other examples are provided in Table 4 below. 
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TABLE 4: 



Gene 



Source 



Human Disease 



Growth Factors 

HST/KS 
INT-2 

INTI/WNTI 
SIS 



Transfection 
MMTV promoter 

Insertion 
MMTV promoter 

Insertion 
Simian sarcoma virus 



Receptor Tyrosine Kinases 
ERBB/HER Avian erythroblastosis 

virus; ALV promoter 
insertion; amplified 
human tumors 
Transfected from rat 
Glioblastomas 



ERBB-2/NEU/HER-2 

FMS 
KIT 

TRK 

MET 

RET 



ROS 

PDGF receptor 
TGF-B receptor 



NONRECEPTOR TYROSINE KINASES 
ABl AbelsonMul.V 



SM feline sarcoma virus 
HZ feline sarcoma virus 

Transfection from 
human colon cancer 

Transfection from 
human osteosarcoma 

Translocations and point 

mutations 



URII avian sarcoma 

Virus 
Translocation 



FPS/FES 
LCK 

SRC 
YES 



Avian Fujinami SV;GA 
FeSV 

Mul.V (murine leukemia 

virus) promoter 

insertion 
Avian Rous sarcoma 

Virus 



Avian Y73 virus 



SER/THR PROTEIN KINASES 

AKT8 murine retrovirus 



Amplified, deleted 
Squamous cell 
Cancer; glioblastoma 

Amplified breast, 
Ovarian, gastric 
cancers 



Sporadic thyroid cancer; 
familial medullary 
thyroid cancer; 
multiple endocrine 
neoplasias 2A and 2B 



Chronic 
Myelomonocytic 
Leukemia 

Colon carcinoma 
mismatch mutation 
target 



Chronic myelogenous 
leukemia translocation 
withBCR 



Function 



FGF family member 
FGF family member 
Factor-like 
PDGFB 



EGF/TGF-a/ 
Amphiregulin/ 
Hetacellulin receptor 

Regulated by NDF/ 

Heregulin and EGF- 

Related factors 
CSF-1 receptor 
MGF/Steel receptor 

Hematopoieis 
NGF (nerve growth 

Factor) receptor 
Scatter factor/HGF 

Receptor 
Orphan receptor Tyr 

Kinase 



Orphan receptor Tyr 

Kinase 
TEL(ETS-like 

transcription factor)/ 

PDGF receptor gene 

Fusion 



Interact with RB, RNA 
polymerase, CRK, 
CBL 



Src family; T cell 
signaling; interacts 
CD4/CD8 T cells 

Membrane-associated 

Tyr kinase with 

signaling function; 

activated by receptor 

kinases 

Src family; signaling 



Regulated by PI(3)K?; 
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Gene 



Source 



Human Disease 



Function 



MOS 

PIM-1 
RAF/MIL 



Maloney murine SV 



Promoter insertion 

Mouse 
3611 murine SV;MH2 

avian SV 



MISCELLANEOUS CELL SURFACE 



APC 
DCC 

E-cadherin 



PTC/NBCCS 



TANA. Notch 
homologue 



Tumor suppressor 
Tumor suppressor 
Candidate rumor 
Suppressor 

Tumor suppressor and 
Drosophilia homology 



Translocation 



Colon cancer 
Colon cancer 
Breast cancer 



Nevoid basal cell cancer 
syndrome (Gorline 
syndrome) 

T-ALI. 



regulate 70-kd S6 k? 
GVBD; cystostatic 
factor; MAP kinase 
kinase 



Signaling in RAS 
Pathway 



Interacts with catenins 

CAM domains 

Extracellular homotypic 
binding; intracellular 
interacts with catenins 

12 transmembrane 
domain; signals 
through Gli homogue 
CI to antagonize 
hedgehog pathway 

Signaling 



MISCELLANEOUS SIGNALING 



BCL-2 
CBL 



CRK 
DPC4 
MAS. 
NCK 



Translocation 
Mu Cas NS-1 V 



CT1010ASV 

Tumor suppressor 

Transfection and 
Tumorigenicity 



B-cell lymphoma 



Pancreatic cancer 



GUANINE NUCLEOTIDE EXCHANGERS AND BINDING PROTEINS 

BC^R 

Translocated with ABL 
inCML 



DBL 
GSP 
NF-1 



Transfection 



Hereditary tumor 
Suppressor 

OST Transfection 
Harvey-Kirsten, N-RAS HaRat SV; Ki RaSV; 

Balb-MoMuSV; 

Transfection 
Transfection 



Tumor suppressor 
neurofibromatosis 

Point mutations in many 
human tumors 



NUCLEAR PROTEINS AND TRANSCRIPTION FACTORS 

Heritable suppressor 



BRCA2 
ERBA 

ETS 
EVII 



Heritable suppressor 
Avian erythroblastosis 
Virus 
Avian E26 virus 
MuLV promotor 



Mammary 
cancer/ovarian cancer 
Mammary cancer 



AML 



Apoptosis 
Tyrosine- 

Phosphorylated RING 

finger interact Abl 
Adapted SH2/SH3 

interact Abl 
TGF-p-related signaling 

Pathway 
Possible angiotensin 

Receptor 
Adaptor SH2/SH3 



Exchanger; protein 
Kinase 
Exchanger 



RAS GAP 

Exchanger 
Signal cascade 



SI 12/S1 13; exchanger 



Localization unsettled 

Function unknown 
Thyroid hormone 
receptor (transcription) 
DNA binding 
Transcription factor 
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Gene 



Source 



Human Disease 



Function 



FOS 
GLI 



HMGI/LIM 



JUN 



MLL/VHRX+ ELI/MEN 



MYB 
MYC 



N-MYC 
L-MYC 
REL 



SKI 
VHL 

WT-1 



Insertion 
FBI/FBR murine 
osteosarcoma viruses 
Amplified glioma 



Translocation <(3 : 1 2) 
*(12:15) 



ASV-17 

Translocation/fusion 
ELL with MLL 
Trithorax-like gene 



Avian myeloblastosis 

Virus 
Avian MC29; 

Translocation B-cell 
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As discussed above, other heterologous sequences include those that can be used as a 
reporter, such as a screenable or selectable marker. Included in this catergory are colorimetric 
or fluorescent reporters (GFP, P-gal, etc.), enzymatic reporters (CAT, luciverase, horseradish 
peroxidase, etc.), or drug resistance reporters. 

Furthermore, it is contemplated that encoded sequences may be fusion proteins, 
fragments or portions of proteins (including peptides), chimeric proteins, as well as non- 
protein molecules such as functional RNA molecules (tRNA, rRNA, miRNA, siRNA, 
antisense, ribozyme, etc.). Such sequences are well known to those of skill in the art. 

Moreover, in some embodiments the nuclec acid sequence is a cell-surface marker or 
an antibody. Cell-surface molecules include, but are not limited to, cluster of differentiation 
antigens (CD), CTLA, CMRF, cellular adhesion molecules (CAM) molecules such as CD4, 
CDS, CMRF83, CTLA4, etc. In some embodiments of the invention, these molecules are 
used to identify cells with stem-cell properties. For example, the cell-surface molecule can be 
used to identify and/or segregate such cells. FACS analysis can be implemented for this 
purpose. 
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B. Host Cells 

The invention include host cells transfected, transformed, or infected with a 
recombinant nucleic acid sequence discussed in this application. Such host cells would be 
considered recombinant host cells. The mode of transmission of the nucleic acid sequence 
5 into the host cell is of no significant consequence with respect to the invention; therefore, the 
terms "transfected," "transformed," and "infected: are used interchangeably unless otherwise 
specified. 

As used herein, the terms "cell," "cell line," and "cell culture" may be used 
interchangeably. All of these terms also include their progeny, which is any and all 

10 subsequent generations. It is understood that all progeny may not be identical due to 
deliberate or inadvertent mutations. In the context of expressing a heterologous nucleic acid 
sequence, "host cell" refers to a prokaryotic or eukaryotic cell, and it includes any 
transformable organisms that is capable of replicating a vector and/or expressing a 
heterologous gene encoded by a vector. A host cell can, and has been, used as a recipient for 

15 vectors. A host cell may be "transfected" or "transformed," which refers to a process by 
which exogenous nucleic acid, such as an s-ship promoter sequence, is transferred or 
introduced into the host cell. A transformed cell includes the primary subject cell and its 1 
progeny. 

Host cells may be derived from prokaryotes or eukaryotes, including yeast cells, 
20 insect cells, and mammalian cells, depending upon whether the desired result is replication of 
the vector or expression of part or all of the vector-encoded nucleic acid sequences. 
Numerous cell lines and cultures are available for use as a host cell, and they can be obtained 
through the American Type Culture Collection (ATCC), which is an organization that serves 
as an archive for living cultures and genetic materials (World Wide Web at atcc.org). An 
25 appropriate host can be determined by one of skill in the art based on the vector backbone 
and the desired result. A plasmid or cosmid, for example, can be introduced into a 
prokaryote host cell for replication of many vectors. Bacterial cells used as host cells for 
vector replication and/or expression include DH5cc, JM109, and KC8, as well as a number of 
commercially available bacterial hosts such as SURE® Competent Cells and SOLOPACK™ 
30 Gold Cells (STRATAGENE®, La Jolla, CA). Alternatively, bacterial cells such as E. coli 
LE392 could be used as host cells for phage viruses. Appropriate yeast cells include 
Saccharomyces cerevisiae, Saccharomyces pombe, and Pichia pastoris. 
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Examples of eukaryotic host cells for replication and/or expression of a vector include 
HeLa, NIH3T3, Jurkat, 293, Cos, CHO, Saos, and PC12. Stem cell lines and other immature 
cell lines are specifically contemplated as suitable host cells of the invention. Many host cells 
from various cell types and organisms are available and would be known to one of skill in the 
art. Similarly, a viral vector may be used in conjunction with either a eukaryotic or 
prokaryotic host cell, particularly one that is permissive for replication or expression of the 



vector. 



Some vectors may employ control sequences that allow it to be replicated and/or 
expressed in both prokaryotic and eukaryotic cells. One of skill in the art would further 
understand the conditions under which to incubate all of the above described host cells to 
maintain them and to permit replication of a vector. Also understood and known are 
techniques and conditions that would allow large-scale production of vectors, as well as 
production of the nucleic acids encoded by vectors and their cognate polypeptides, proteins, 
or peptides. 

In certain embodiments of the invention, a host cell refers to a cell obtained from a 
subject that is transfected, transformed, or infected with an s-ship promoter region. The 
promoter region may integrate into the cell's genome or it may exist in the cell 
extrachromosomally. Moreover, it may be operably linked to a nucleic acid sequence whose 
expression is controlled by the s-ship promoter region, m further embodiments, the nucleic 
acid sequence is a heterologous sequence, meaning it is not one associated with an s-ship 
promoter in nature. In some cases, the heterologous sequence is a reporter gene. In other 
cases, it is a therapeutic or diagnostic nucleic acid, meaning that the resulting transcript or 
protein can be used as a therapeutic or diagnostic with respect to the host cell. It is 
contemplated that the host cell may be obtained from subject, transfected or infected with the 
s-shi P promoter region, which may or may be operably linked to a therapeutic or diagnostic 
nucleic acid, and returned back to the subject. This ex vivo approach can be used in a variety 
of contexts, including but not limited to, the treatment of cancer or other hyperproliferative 
diseases or disorders, autoimmune disease, diseases or disorders involving stem cells, 
diseases or disorders treatable with stem cells, and/or diseases or disorders caused by protein 
deficiencies. 

In certain embodiments, the s-ship promoter regions are employed to direct 
transcription in cells in a temporal or developmental^ specific manner. Therefore, it is 
contemplated that in some embodiments of the invention, a recombinant host cell contains a 
heterologous nucleic acid sequence under the control of an s-ship promoter region and the 
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heterologous nucleic acid sequence is initially expressed but after the cell differentiates, 
expression is limited or eliminated. In some embodiments, stem cells are used for ex vivo 
therapy in which stem cells or a subset of progenitor cells are obtained either from a non- 
recipient donor or from the recipient themselves, introduced with the s-ship promoter- 
heterologous nucleic acid sequence, and then administered to the subject in which therapy is 
needed. It is contemplated that the introduced cells could potentially provide expression 
insofar as they did not become differentiated. 

The invention has applicability also with respect to tumor cells as host cells for 
nucleotides containing s-ship promoter regions. The idea of tumor stem cells has been around 
for more than 40 years, however, more recent technology has given further credible support 
to this concept. Initially, experiments demonstrated that about one in a million tumor cells 
can initiate a tumor upon transplantation into autologous hosts. 

Two models were proposed to account for these findings; 1) all tumor cells are alike 
and stochastic events determine whether a cell might be a tumor initiating cell, and 2) not all 
tumor cells are alike, but a very few (about 1/1 0 6 tumor cells) are inherently capable of tumor 
initiation on transplantation into a suitable host. These models, termed variously: stochastic 
vs. hierarchy, nurture vs. nature, probabilistic vs. deterministic, have been applied to 
understanding many stem/developmental systems. Recently, however, these models of tumor 
development have been tested using the more refined NOD/SCID mouse for transplantation. 
Breast tumor-specific cell-surface markers (along with the absence of lineage markers (lin-) 
of differentiated cells) have been used for identification and isolation of the few tumor 
initiating cells within the breast tumor mass (Al- Hajj M. Wicha MS. Benito-Hernandez A. 
Morrison SJ. Clarke MF. Prospective identification of tumorigenic breast cancer cells.Proc Natl Acad 
Sci USA. 100:3983-8, 2003.). In this case, specific populations capable of tumor 
transplantation were identified, and these tumor cells were all lineage-minus. Further 
experiments showed that the same tumor population could be isolated from the transplanted 
animals, and that the transplanted cells had reestablished the same heterogeneity in 
transplanted tumor as found in the primary population. These data strongly support the 
hierarchy model (#2, above) in which specific cells are destined for sustaining the 
tumorigenic capability. The results above were obtained with breast tumors, but studies in 
both brain and acute myeloid leukemia (AML) have supported the same characteristics of 
tumor cells able to initiate tumors on transplantation (Lapidot et al, 1994; Singh et al, 2003; 
Bhatiaefai, 1997). 
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In the case of human AML, hematopoietic stem cell surface markers are well known 
(CD34+, CD38-. Lin-, Thyl.R, c-Kitlo) and this "stem cell" fraction contained the tumor- 
imtiating cells. Results from all three tumor systems examined indicate that a tissue stem cell 
might be the initial target for tumor formation. The properties of the tumor stem cells 
identified by transplantation suggest that they sustain the tumor mass by self-replication- 
however, partial differentiation of the tumor stem cell produces a vast majority of tumor cells' 
which can no longer sustain the tumor on transplantation. Thus, like normal stem cells tumor 
stem cells self-replicate and differentiate producing a larger mass of differentiated but non- 
transplantable tumor cells. 

This theory of tumor stem cells as the primary source of tumor growth has important 
implications for tumor therapy and potential stem cell therapies for correction of tissue- 
dependent human diseases. The fact that the s-SHLP promoter expresses exclusively in 
stem/progenitors in the embryo and adult suggests s-SHff protein can be used to further 
study tumor models and evaluate pathways and to drive expression of agents, including 
therapeutic and diagnostic agents, in tumor cells, particularly those that qualify as tumor stem 
cells. 

C. Assays of Transgene Expression 

Assays may be employed with the instant invention for determination of the relative 
.efficiency of transgene expression.. For example, assays may be used to determine the 
efficacy of deletion mutants of the s-ship promoter in directing expression of exogenous 
protems. Similarly, one could produce random or site-specific mutants of the s-ship promoter 
of the mvention and assay the efficacy of the mutants in the express™ of a given transgene 
Alternatively, assays could be used to determine the efficacy of the s-ship promoter in 
dnectmg protein expression when used .in conjunction with various different enhancers 
termmators or other types of elements potentially used in the preparation of transformation 
constructs. 

For mammals, expression assays may comprise a system utilizing cell lines or 
alternatively, whole organisms. Additionally, assays of tissue or developmental specific 
promoters are generally feasible. 

The biological sample to be assayed may comprise nucleic acids isolated from the 
cells of any plant material according to standard methodologies (Sambrook * al, 1989) The 
nucleic acid may be genomic DNA or fractionated or whole cell RNA. Where RNA is used 
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it may be desired to convert the RNA to a complementary DNA. In one embodiment of the 
invention, the RNA is whole cell RNA; in another, it is poly-A RNA. Normally, the nucleic 
acid is amplified. 

Depending on the format, the specific nucleic acid of interest is identified in the 
5 sample directly using amplification or with a second, known nucleic acid following 
amplification. Next, the identified product is detected. In certain applications, the detection 
may be performed by visual means (e.g., ethidium bromide staining of a gel). Alternatively, 
the detection may involve indirect identification of the product via chemiluminescence, 
radioactive scintigraphy of radiolabel or fluorescent label or even via a system using 
10 electrical or thermal impulse signals (Affymax Technology; Bellus, 1994). 

Following detection, one may compare the results seen in a given sample with a 
statistically significant reference group of non-transformed control cells. Typically, the non- 
transformed control cells will be of a genetic background similar to the transformed cells. In 
this way, it is possible to detect differences in the amount or kind of protein detected in 
15 various transformed cells. 

As indicated, a variety of different assays are contemplated in the screening of cells or 
animals of the current invention and associated promoters. These techniques may in cases be 
used to detect for both the presence and expression of the particular genes as well as 
rearrangements that may have occurred in the gene construct. The techniques include but are 
20 not limited to, fluorescent in situ hybridization (FISH), direct DNA sequencing, pulsed field 
gel electrophoresis (PFGE) analysis, Southern or Northern blotting, single-stranded 
conformation analysis (SSCA), RNAse protection assay, allele-specific oligonucleotide 
(ASO), dot blot analysis, denaturing gradient gel electrophoresis, RFLP and PCR™-SSCP. 

1. Quantitation of Gene Expression with Relative Quantitative RT- 
25 PCR™ 

Reverse transcription (RT) of RNA to cDNA followed by relative quantitative PCR™ 
(RT-PCR™) can be used to determine the relative concentrations of specific mRNA species, 
for example, an mRNA whose expression is controlled by an s-ship promoter. By 
determining that the concentration of a specific mRNA species varies, it can be shown that 
30 the gene encoding the specific mRNA species is differentially expressed. In this way, a 
promoters expression profile can be rapidly identified, as can the efficacy with which the 
promoter directs transgene expression. 

In PCR™, the number of molecules of the amplified target DNA increase by a factor 
approaching two with every cycle of the reaction until some reagent becomes limiting. 



51 



WO 2005/090559 PCT/US2005/008977 

Thereafter, the rate of amplification becomes increasingly diminished until there is no 
increase in the amplified target between cycles. If a graph is plotted in which the cycle 
number is on the X axis and the log of the concentration of the amplified target DNA is on 
the Y axis, a curved line of characteristic shape is formed by connecting the plotted points. 
Beginning with the first cycle, the slope of the line is positive and constant. This is said to be 
the linear portion of the curve. After a reagent becomes limiting, the slope of the line begins 
to decrease and eventually becomes zero. At this point the concentration of the amplified 
target DNA becomes asymptotic to some fixed value. This is said to be the plateau portion of 
the curve. 

The concentration of the target DNA in the linear portion of the PCR™ amplification 
is directly proportional to the starting concentration of the target before the reaction began. 
By determining the concentration of the amplified products of the target DNA in PCR™ 
reactions that have completed the same number of cycles and are in their linear ranges, it is 
possible to determine the relative concentrations of the specific target sequence in the original 
DNA mixture. If the DNA mixtures are cDNAs synthesized from RNAs isolated from 
different tissues or cells, the relative abundances of the specific mRNA from which the target 
sequence was derived can be detennined for the respective tissues or cells. This direct 
proportionality between the concentration of the -PCR™ products and the relative mRNA 
abundances is only true in the linear range of the PCR™ reaction. 

The final concentration of the target DNA in the plateau portion of the curve is 
determined by the availability of reagents in the reaction mix and is independent of the 
original concentration of target DNA. Therefore, the first condition that must be met before 
the relative abundances of a mRNA species can be determined by RT-PCR™ for a collection 
of RNA populations is that the concentrations of the amplified PCR™ products must be 
sampled when the PCR™ reactions are in the linear portion of their curves. 

The second condition that must be met for an RT-PCR™ study to successfully 
determine the relative abundances of a particular mRNA species is that relative 
concentrations of the amplifiable cDNAs must be normalized to some independent standard. 
The goal of an RT-PCR™ study is to determine the abundance of a particular mRNA species 
relative to the average abundance of all mRNA species in the sample. 

Most protocols for competitive PCR™ utilize internal PCR™ standards that are 
approximately as abundant as the target. These strategies are effective if the products of the 
PCR™ amplifications are sampled during their linear phases. If the products are sampled 
when the reactions are approaching the plateau phase, then the less abundant product 
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becomes relatively over represented. Comparisons of relative abundances made for many 
different RNA samples, such as is the case when exarnining RNA samples for differential 
expression, become distorted in such a way as to make differences in relative abundances of 
RNAs appear less than they actually are. This is not a significant problem if the internal 
standard is much more abundant than the target. If the internal standard is more abundant 
than the target, then direct linear comparisons can be made between RNA samples. 

The above discussion describes theoretical considerations for an RT-PCR™ assay for 
plant tissue. The problems inherent in plant tissue samples are that they are of variable 
quantity (making normalization problematic), and that they are of variable quality 
(necessitating the co-amplification of a reliable internal control, preferably of larger size than 
the target). Both of these problems are overcome if the RT-PCR™ is performed as a relative 
quantitative RT-PCR™ with an internal standard in which the internal standard is an 
amplifiable cDNA fragment that is larger than the target cDNA fragment and in which the 
abundance of the mRNA encoding the internal standard is roughly 5-100 fold higher than the 
mRNA encoding the target. This assay measures relative abundance, not absolute abundance 
of the respective mRNA species. 

Other studies may be performed using a more conventional relative quantitative RT- 
PCR™ assay with an external standard protocol. These assays sample the PCR™ products in 
the linear portion of their amplification curves. The number of PCR™ cycles that are optimal 
for sampling must be empirically determined for each target cDNA fragment, hi addition, the 
reverse transcriptase products of each RNA population isolated from the various tissue 
samples must be carefully normalized for equal concentrations of amplifiable cDNAs. This 
consideration is very important since the assay measures absolute mRNA abundance. 
Absolute mRNA abundance can be used as a measure of differential gene expression only in 
normalized samples. While empirical determination of the linear range of the amplification 
curve and normalization of cDNA preparations are tedious and time consuming processes, 
the resulting RT-PCR™ assays can be superior to those derived from the relative quantitative 
RT-PCR™ assay with an internal standard. 

One reason for this advantage is that without the internal standard/competitor, all of 
the reagents can be converted into a single PCR™ product in the linear range of the 
amplification curve, thus increasing the sensitivity of the assay. Another reason is that with 
only one PCR™ product, display of the product on an electrophoretic gel or another display 
method becomes less complex, has less background and is easier to interpret. 
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2. Marker Gene Expression 

Marker genes represent an efficient means for assaying the expression of transgenes. 
Using, for example, a selectable marker gene, one could quantitatively determine the 
expression levels in the cell using a construct comprising the selectable marker coding region 
5 operably linked to the promoter to be assayed, e.g., an s-ship promoter. Alternatively, 
particular cell types could be exposed to a selective agent and the relative resistance provided 
in these cells quantified, thereby providing an estimate of the tissue specific expression of the 
promoter. 

Screenable markers constitute another efficient means for quantifying the expression 
10 of a given transgene. Potentially any screenable marker could be expressed and the marker 
gene product quantified, thereby providing an estimate of the efficiency with which the 
promoter directs expression of the transgene. Quantification can readily be carried out using 
either visual means, or, for example, a photon counting device. 

A preferred screenable marker gene assay for use with the current invention include 
15 the use of the screenable marker gene p-galactosidase (P-gal), luciferase, or green fluorescent 
protein (GFP). 

3. Purification and Assays of Proteins 

One means for determining the efficiency with which a particular transgene is 
expressed is to purify and quantify a polypeptide expressed by the transgene. Protein 

20 purification techniques are well known to those of skill in the art. These techniques involve, 
at one level, the crude fractionation of the cellular milieu to polypeptide and non-polypeptide 
fractions. Having separated the polypeptide from other proteins, the polypeptide of interest 
may be further purified using chromatographic and electrophoretic techniques to achieve 
partial or complete purification (or purification to homogeneity). Analytical methods 

25 particularly suited to the preparation of a pure peptide are ion-exchange chromatography, 
exclusion chromatography; polyacrylamide gel electrophoresis; and isoelectric focusing. A 
particularly efficient method of purifying peptides is fast protein liquid chromatography or 
evenHPLC. 

Various techniques suitable for use in protein purification will be well known to those 
30 of skill in the art. These include, for example, precipitation with ammonium sulphate, PEG, 
antibodies and the like or by heat denaturation, followed by centrifugation; chromatography 
steps such as ion exchange, gel filtration, reverse phase, hydroxylapatite and affinity 
chromatography; isoelectric focusing; gel electrophoresis; and combinations of such and 
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other techniques. As is generally known in the art, it is believed that the order of conducting 
the various purification steps may be changed, or that certain steps may be omitted, and still 
result in a suitable method for the preparation of a substantially purified protein or peptide. 

There is no general requirement that the protein or peptide being assayed always be 
5 provided in their most purified state. Indeed, it is contemplated that less substantially 
purified products will have utility in certain embodiments. Partial purification may be 
accomplished by using fewer purification steps in combination, or by utilizing different forms 
of the same general purification scheme. For example, it is appreciated that a cation- 
exchange column chromatography performed utilizing an HPLC apparatus will generally 
10 result in a greater "-fold" purification than the same technique utilizing a low pressure 
chromatography system. Methods exhibiting a lower degree of relative purification may 
have advantages in total recovery of protein product, or in mamtaining the activity of an 
expressed protein. 

It is known that the migration of a polypeptide can vary, sometimes significantly, with 
15 different conditions of SDS/PAGE (Capaldi et al, 1977). It will therefore be appreciated that 
under differing electrophoresis conditions, the apparent molecular weights of purified or 
partially purified expression products may vary. 

High Perfonnance Liquid Chromatography (HPLC) is characterized by a very rapid 
separation with extraordinary resolution of peaks. This is achieved by the use of very fine 
20 particles and high pressure to maintain an adequate flow rate. Separation can be 
accomplished in a matter of minutes, or at most an hour. Moreover, only a very small 
volume of the sample is needed because the particles are so small and close-packed that the 
void volume is a very small fraction of the bed volume. Also, the concentration of the 
sample need not be very great because the bands are so narrow that there is very little dilution 
25 of the sample. 

Gel chromatography, or molecular sieve chromatography, is a special type of partition 
chromatography that is based on molecular size. The theory behind gel chromatography is 
that the column, which is prepared with tiny particles of an inert substance that contain small 
pores, separates larger molecules from smaller molecules as they pass through or around the 
30 pores, depending on their size. As long as the material of which the particles are made does 
not adsorb the molecules, the sole factor determining rate of flow is the size. Hence, 
molecules are eluted from the column in decreasing size, so long as the shape is relatively 
constant. Gel chromatography is unsurpassed for separating molecules of different size 
because separation is independent of all other factors such as pH, ionic strength, temperature, 
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etc. There also is virtually no adsorption, less zone spreading and the elution volume is 
related in a simple matter to molecular weight. 

Affinity Chromatography is a chromatographic procedure that relies on the specific 
affinity between a substance to be isolated and a molecule that it can specifically bind to. 
5 This is a receptor-ligand type interaction. The column material is synthesized by covalently 
coupling one of the binding partners to an insoluble matrix. The column material is then able 
to specifically adsorb the substance from the solution. Elution occurs by changing the 
conditions to those in which binding will not occur (alter pH, ionic strength, temperature, 
etc.). 

10 A particular type of affinity chromatography useful in the purification of carbohydrate 

containing compounds is lectin affinity chromatography. Lectins are a class of substances 
that bind to a variety of polysaccharides and glycoproteins. 

The matrix should be a substance that itself does not adsorb molecules to any 
significant extent and that has a broad range of chemical, physical and thermal stability. The 

15 ligand should be coupled in such a way as to not affect its binding properties. The hgand 
should also provide relatively tight binding. And it should be possible to elute the substance 
without destroying the sample or the hgand. One. of the most common forms of affinity 
chromatography is irnmunoaffinity chromatography. The generation of antibodies that would 
be suitable for use in accord with the present invention is well known to those of skill in the 

20 art. 

D. Methods of Gene Transfer 

Suitable methods for nucleic acid delivery to effect expression of compositions of the 
present invention are believed to include virtually any method by which a nucleic acid {e.g., 
DNA, including viral and nonviral vectors) can be introduced into an organelle, a cell, a 

25 tissue or an organism, as described herein or as would be known to one of ordinary skill in 
the art. Such methods include, but are not limited to, direct delivery of DNA such as by 
injection (U.S. Patents 5,994,624, 5,981,274, 5,945,100, 5,780,448, 5,736,524, 5,702,932, 
5,656,610, 5,589,466 and 5,580,859, each incorporated herein by reference), including 
microinjection (Harland and Weintraub, 1985; U.S. Patent 5,789,215, incorporated herein by 

30 reference); by electroporation (U.S. Patent No. 5,384,253, incorporated herein by reference); 
by calcium phosphate precipitation (Graham and Van Der Eb, 1973; Chen and Okayama, 
1987; Rippe et al, 1990); by using DEAE-dextran followed by polyethylene glycol (Gopal, 
1985); by direct sonic loading (Fechheimer et al., 1987); by liposome mediated transfection 
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(Nicolau and Sene, 1982; Eraleyef al, 1979; Nicolaue* al, 1987; WongeJ a/., 1980; 
Kanedae? al, 1989; Kato et al, 1991); by microprojectile bombardment (PCT Application 
Nos. WO 94/09699 and 95/06128; U.S. Patents 5,610,042; 5,322,783 5,563,055, 5,550,318, 
5,538,877 and 5,538,880, and each incorporated herein by reference); by agitation with 
silicon carbide fibers (Kaeppler et al, 1990; U.S. Patents 5,302,523 and 5,464,765, each 
incorporated herein by reference); by desiccation/inhibition-mediated DNA uptake 
(Potrykus et al, 1985). Through the application of techniques such as these, organelle(s), 
cell(s), tissue(s) or organism(s) may be stably or transiently transformed. 

E. Transgenic and Knockout Animals 
1. Transgenic Animals 

It is further contemplated that transgenic animals are part of the present invention. A 
transgenic animal of the present invention may involve an animal in which an s-ship 
promoter drives the expression of a transgene. The transgene can be expressed temporally or 
spatially in a manner different than or the same as a non-transgenic animal. The transgene 
may also be heterologous with respect to the host cell or organism, such as, for example, the 
luciferase gene in a mammalian cell. Moreover, it is contemplated that the transgene may be 
expressed in a different tissue type or in a different amount or at a different time than the 
endogenously expressed version of the transgene. 

In a general aspect, a transgenic animal is produced by the integration of a given 
transgene into the genome in a manner that permits the expression of the transgene, or by 
disrupting the wild-type gene, leading to a knockout of the wild-type gene. Methods for 
producing transgenic animals are generally described by Wagner and Hoppe (U.S. Patent No. 
4,873,191; which is incorporated herein by reference), Brinster et al. (1985; which is 
incorporated herein by reference in its entirety) and in "Manipulating the Mouse Embryo; A 
Laboratory Manual" 2nd edition (eds., Hogan, Beddington, Costantimi and Long, Cold 
Spring Harbor Laboratory Press, 1994; which is incorporated herein by reference in its 
entirety). 

U.S. Patent 5,639,457 is also incorporated herein by reference to supplement the 
present teaching regarding transgenic pig and rabbit production. U.S. Patents 5,175,384; 
5,175,385; 5,530,179, 5,625,125, 5,612,486 and 5,565,186 are also each incorporated herein 
by reference to similarly supplement the present teaching regarding transgenic mouse and rat 
production. Transgenic animals may be crossed with other transgenic animals or knockout 
animals to evaluate phenotype based on compound alterations in the genome. 
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2. Knockout Animals or Cells 

The generation of an animal model lacking s-ship or a particular nucleic acid 
(encoding an RNA that is translated or not) is contemplated as part of the present invention to 
understand further stem cell function. This strategy could also be implemented in cell culture 
5 as well. 

The lack of activity as a result of the knockout may provoke various types of 
pathophysiological disturbances in a knockout animal or cell. This can be used to 
characterize the role or function of a particular gene product at a particular time in 
development or in a particular cell type. Use of the s-ship promoter can be used to drive the 

10 expression of the knockout gene such that only certain cells, for example stem cells/may be 
affected. One method of inhibiting the endogenous expression of a particular gene in an 
animal is to disrupt the gene in germline cells and produce offspring from these cells. This 
method is generally known as knockout technology. U.S. Patent No. 5,616,491, incorporated 
herein by reference in its entirety, generally describes the techniques involved in the 

15 preparation of knockout mice, and in particular describes mice having a suppressed level of 
expression of the gene encoding CD28 on T cells, and mice wherein the expression of the 
gene encoding CD45 is suppressed on B cells. Pfeffer et al. (1993) describe mice in which 
the gene encoding the tumor necrosis factor receptor p55 has been suppressed. The mice 
showed a decreased response to tumor necrosis factor signaling. Fung-Leung et al. (1991a; 

20 1991b) describe knockout mice lacking expression of the gene encoding CD8. These mice 
were found to have a decreased level of cytotoxic T cell response to various antigens and to 
certain viral pathogens such as lymphocytic choriomeningitis virus. 

The term "knockout" refers to a partial or complete suppression of the expression of 
at least a portion of a protein encoded by an endogenous DNA sequence in a cell. The term 

25 "knockout construct" refers to a nucleic acid sequence that is designed to decrease or 
suppress expression of a protein encoded by endogenous DNA sequences in a cell. The 
nucleic acid sequence used as the knockout construct is typically comprised of: (1) DNA 
from some portion of the gene (exon sequence, intron sequence, and/or promoter sequence) 
to be suppressed, in conjunction with all or part of the s-ship promoter; and (2) a marker 

30 sequence used to detect the presence of the knockout construct in the cell. The knockout 
construct is inserted into a cell, and integrates with the genomic DNA of the cell in such a 
position so as to prevent or interrupt transcription of the native DNA sequence. Such 
insertion usually occurs by homologous recombination {i.e., regions of the knockout 
construct that are homologous to endogenous DNA sequences hybridize to each other when 
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the knockout construct is inserted into the cell and recombine so that the knockout construct 
is incorporated into the corresponding position of the endogenous DNA). 

The knockout construct nucleic acid sequence may comprise 1) a full or partial 
sequence of one or more exons and/or introns of the gene to be suppressed, 2) a full or partial 
5 promoter sequence of the gene to be suppressed, or 3) combinations thereof. Typically, the 
knockout construct is inserted into an embryonic stem cell (ES cell) and is integrated into the 
ES cell genomic DNA, usually by the process of homologous recombination. This ES cell is 
then injected into, and integrates with, the developing embryo. 

The phenotype of a mouse heterozygous for the knockout may lend clues as to the 

10 function and importance of that gene or sequence, as well as contribute an understanding 
about its physiological relevance, particularly with respect to disease states. Animals 
completely lacking the targeted gene (homozygous null) may provide additional information. 
Mice lacking the targeted gene may not be viable, which itself is indicative of the importance 
of that gene. Should such mice be viable (heterozygous or homozygous nulls), they may be 

15 crossed with other transgenic or knockout mice. Furthermore, knock-out mice having any 
phenotype that resembles a disease state may be used to screen or test therapeutic drugs that 
slow, modify, or cure conditions. As is known to the skilled artisan, a conditional knockout, 
Wherein the gene is disrupted under certain conditions, is frequently used. 

3. Conditional Transgenic and Knockdown Animals and Cells 

20 The present invention further contemplates conditional transgenic or knockdown 

animals (or cells in culture), such as those produced using recombination methods. 
Bacteriophage PI Cre recombinase and flp recombinase from yeast plasmids are two non- 
limiting examples of site-specific DNA recombinase enzymes which cleave DNA at specific 
target sites (lox P sites for cre recombinase and frt sites for flp recombinase) and catalyze a 

25 ligation of this DNA to a second cleaved site. A large number of suitable alternative site- 
specific recombinases have been described, and their genes can be used in accordance with 
the method of the present invention. Such recombinases include the Int recombinase of 
bacteriophage X (with or without Xis) (Weisberg et. al, 1983), herein incorporated by 
reference); Tpnl and the pMactamase transposons (Mercier et al, 1990); the Tn3 resolvase 

30 (Flanagan and Fennewald, 1989); the yeast recombinases (Matsuzaki et al, 1990); the B. 
subtilis SpoIVC recombinase (Sato et al, 1990); the Flp recombinase (Schwartz and 
Sadowski, 1989; Parsons et al, 1990; Golic and Lindquist, 1989; Amin et al, 1990); the Hin 
recombinase (Glasgow et al, 1989); immunoglobulin recombinases (Malynn et al, 1988); 
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and the Cin recombinase (Haffter and Bickle, 1988; Hubner et al, 1989), all herein 
incorporated by reference. Such systems are discussed (Echols, 1990; de Villartay, 1988; 
Craig, 1988; Poyart-Salmeron et al, 1989; Hunger-Bertling et al, 1990; and Cregg and 
Madden, 1989), all herein incorporated by reference. 
5 Of particular interest in the present invention is the Cre recombinase. Cre has been 

purified to homogeneity, and its reaction with the loxP site has been extensively characterized 
(Abremski and Hess, 1984), herein incorporated by reference). Cre protein has a molecular 
weight of 35,000 and can be obtained commercially from New England Nuclear/DuPont. The 
cre gene (which encodes the Cre protein) has been cloned and expressed (Abremski et al, 

10 1983), herein incorporated by reference). The Cre protein mediates recombination between 
two loxP sequences (Sternberg et al, 1981), which may be present on the same or different 
DNA molecule. Because the internal spacer sequence of the loxP site is asymmetrical, two 
loxP sites can exhibit directionality relative to one another (Hoess and Abremski, 1984). 
Thus, when two sites on the same DNA molecule are in a directly repeated orientation, Cre 

15 will excise the DNA between the sites (Abremski et al, 1983). However, if the sites are 
inverted with respect to each other, the DNA between them is not excised after recombination 
but is simply inverted. Thus, a circular DNA molecule having two loxP sites in direct 
orientation will recombine to produce two smaller circles, whereas circular molecules having 
two loxP sites in an inverted orientation simply invert the DNA sequences flanked by the 

20 loxP sites. In addition, recombinase action can result in reciprocal exchange of regions distal 
to the target site when targets are present on separate DNA molecules. 

Recombinases have important application for characterizing gene function in 
knockout models. When the constructs described herein are used to disrupt limulus clotting 
factor protease-like genes, a fusion transcript can be produced when insertion of the positive 

25 selection marker occurs downstream (3') of the translation initiation site of the limulus 
clotting factor protease-like gene. The fusion transcript could result in some level of protein 
expression with unknown consequence. It has been suggested that insertion of a positive 
selection marker gene can affect the expression of nearby genes. These effects may make it 
difficult to determine gene function after a knockout event since one could not discern 

30 whether a given phenotype is associated with the inactivation of a gene, or the transcription 
of nearby genes. Both potential problems are solved by exploiting recombinase activity. 
When the positive selection marker is flanked by recombinase sites in the same orientation, 
the addition of the corresponding recombinase will result in the removal of the positive 
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selection marker. In this way, effects caused by the positive selection marker or expression of 
fusion transcripts are avoided. 

III. Proteinaceous Compositions 

In certain embodiments, the present invention concerns novel compositions 
comprising at least one proteinaceous molecule, such as s-SHIPl, SHIP1, or a modulator of 
an s-shipl promoter. As used herein, a "proteinaceous molecule," "proteinaceous 
composition," "proteinaceous compound," "proteinaceous chain" or "proteinaceous material" 
generally refers, but is not limited to, a protein of greater than about 200 amino acids or the 
full length endogenous sequence translated from a gene; a polypeptide of greater than about 
100 amino acids; and/or a peptide of from about 3 to about 100 amino acids. All the 
"proteinaceous" terms described above may be used interchangeably herein. 

In certain embodiments the size of the at least one proteinaceous molecule may 
comprise, but is not limited to, about 5, about 6, about 7, about 8, about 9, about 10, about 1 1, 
about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, 
about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, 
about 30, about 31, about 32, about 33, about 34, about 35, about 36, about 37, about 38, 
about 39, about 40, about 41, about 42, about 43, about 44, about 45, about 46, about 47, 
about 48, about 49, about 50, about 51, about 52, about 53, about 54, about 55, about 56, 
about 57, about 58, about 59, about 60, about 61, about 62, about 63, about 64, about 65, 
about 66, about 67, about 68, about 69, about 70, about 71, about 72, about 73, about 74, 
about 75, about 76, about 77, about 78, about 79, about 80, about 81, about 82, about 83, 
about 84, about 85, about 86, about 87, about 88, about 89, about 90, about 91, about 92, 
about 93, about 94, about 95, about 96, about 97, about 98, about 99, about 100, about 110, 
about 120, about 130, about 140, about 150, about 160, about 170, about 180, about 190, 
about 200, about 210, about 220, about 230, about 240, about 250, about 275, about 300, 
about 325, about 350, about 375, about 400, about 425, about 450, about 475, about 500, 
about 525, about 550, about 575, about 600, about 625, about 650, about 675, about 700, 
about 725, about 750, about 775, about 800, about 825, about 850, about 875, about 900, 
about 925, about 950, about 975, about 1000, about 1100, about 1200, about 1300, about 
1400, about 1500, about 1750, about 2000, about 2250, about 2500 or greater amino 
molecule residues, and any range derivable therein. 

As used herein, an "amino molecule" refers to any amino acid, amino acid derivative 
or amino acid mimic as would be known to one of ordinary skill in the art. In certain 
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embodiments, the residues of the proteinaceous molecule are sequential, without any non- 
ammo molecule interrupting the sequence of amino molecule residues. In other 
embodiments, the sequence may comprise one or more non-amino molecule moieties. In 
particular embodiments, the sequence of residues of the proteinaceous molecule may be 
interrupted by one or more non-amino molecule moieties. 

Accordingly, the term "proteinaceous composition" encompasses amino molecule 
sequences comprising at least one of the 20 common amino acids in naturally synthesized 
proteins, or at least one modified or unusual amino acid. 

Proteinaceous compositions may be made by any technique known to those of skill in 
the art, including the expression of proteins, polypeptides or peptides through standard 
molecular biological techniques, the isolation of proteinaceous compounds from natural 
sources, or the chemical synthesis of proteinaceous materials. The nucleotide and protein, 
polypeptide and peptide sequences for various genes have been previously disclosed, and 
may be found at computerized databases known to those of ordinary skill in the art. One such 
database is the National Center for Biotechnology Information's Genbank and GenPept 
databases (http://ww.ncbi.nmi.mh.gov/). The coding regions for these known genes may be 
amplified and/or expressed using the techniques disclosed herein or as would be know to 
those of ordinary skill in the art. Alternatively, various commercial preparations of proteins, 
polypeptides and peptides are known to those of skill in the art. 

In certain embodiments a proteinaceous compound may be purified. Generally, 
"purified" will refer to a specific or protein, polypeptide, or peptide composition that has 
been subjected to fractionation to remove various other proteins, polypeptides, or peptides, 
and which composition substantially retains its activity, as may be assessed, for example, by 
the protein assays, as would be known to one of ordinary skill in the art for the specific or 
desired protein, polypeptide or peptide. 

It is contemplated that virtually any protein, polypeptide or peptide containing 
component may be used in the compositions and methods disclosed herein. However, it is 
preferred that the proteinaceous material is biocompatible. 

IV. Therapeutic Applications 

The invention is widely applicable to a variety of situations where it is desirable to be 
able to regulate the level of gene expression, such as by turning gene expression "on" and 
"off, in a rapid, efficient and controlled manner without causing pleiotropic effects or 
cytotoxicity. The invention may be particularly useful for gene therapy purposes in humans, 
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in treatments for either genetic or acquired diseases. The general approach of gene therapy 
involves the introduction of one or more nucleic acid molecules into cells such that one or 
more gene products encoded by the introduced genetic material are produced in the cells to 
restore or enhance a functional activity. For reviews on gene therapy approaches Anderson, et 
al. (1992; Miller et al. (1992); Friedmann et al. (1989); and Cournoyer et al. (1990). 
However, current gene therapy vectors typically utilize constitutive regulatory elements 
which are responsive to endogenous transcriptions factors. These vector systems do not allow 
for the ability to modulate the level of gene expression in a subject. In contrast, the regulatory 
system of the invention provides this ability. 

To use the system of the invention for gene therapy purposes, at least one DNA 
molecule is introduced into cells of a subject in need of gene therapy (e.g., a human subject 
suffering from a genetic or acquired disease) to modify the cells. The cells are modified to 
comprise: 1) nucleic acid encoding an inducible regulator of the invention in a form suitable 
for expression of the inducible regulator in the host cells; and 2) an siRNA (e.g, for 
therapeutic purposes) operatively linked to a tissue-specific promoter such as an s-shipl 
promoter. A single DNA molecule encoding components of the regulatory system of the 
invention can be used, or alternatively, separate DNA molecules encoding each component 
can be used. The cells of the subject can be. modified ex vivo and then introduced into the 
subject or the cells can be directly modified in vivo by conventional techniques for 
introducing nucleic acid into cells. Thus, the regulatory system of the invention offers the 
advantage over constitutive regulatory systems of allowing for modulation of the level of 
gene expression depending upon the requirements of the therapeutic situation. 

Genes of particular interest to be knocked down or knocked out in cells of a subject 
for treatment of genetic or acquired diseases include those encoding a deleterious gene 
product, such as an abnormal protein. Examples of non-limiting specific diseases include 
anemia, blood-related cancers, Parkinson's disease, and diabetes. 

The present invention can be applied to develop autologous or allogeneic cell lines for 
therapeutical purposes. For example, gene therapy applications of particular interest in cell 
and/or organ transplantation are utilized with the present invention, hi exemplary 
embodiments, downregulation of transplantation antigens (such as, for example, by 
downregulation of beta2-microglobulin expression via siRNA) allows for transplantation of 
allogeneic cells while minimizing the risk of rejection by the patient's immune system. The 
present invention would allow for a switch off of the RNAi in case of adverse effects (e.g. 
uncontrollable replication of the transplanted cells). 
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Cells types that can be subjected to the present invention include hematopoietic stem 
cells, myoblasts, hepatocytes, lymphocytes, airway epithelium, skin epithelium, islets, 
dopaminergic neurons, keratinocytes, and so forth. For further descriptions of cell types, 
genes and methods for gene therapy see e.g., Armentano et al. (1990); Wolff et al. (1990); 
5 Chowdhury et al. (1991); Ferry et al. (1991); Quantin et al. (1992); Dai et al. (1992); van 
Beusechem et al. (1992); Rosenfeld et al. (1992); Kay et al. (1992); Cristiano et al. (1993); 
Hwu et al. (1 993); and Herz and Gerard (1 993). 

In particular embodiments of the present invention, there is a method of treating any 
disease condition amenable to treatment with an s-ship promoter. In specific embodiments, 
10 the method comprises preparing a polynucleotide construct having a region encoding a 
therapeutic or diagnostic (marker) gene that is operably linked to an an s-ship promoter, 
wherein the gene encoded by the construct is for the treatment of the disease condition. 

A. Pharmaceutical Formulations, Delivery, and Treatment Regimens 

In an embodiment of the present invention, methods of treatment are contemplated. 
15 An effective amount of the pharmaceutical composition, generally, is defined as that amount 
sufficient to detectably and repeatedly to ameliorate, reduce, minimize or limit the extent of 
the disease or its symptoms. More rigorous definitions may apply, including elimination, 
eradication or cure of disease. 

The routes of administration will vary, naturally, with the location and nature of the 
20 lesion, and include, e.g., intradermal, transdermal, parenteral, intravenous, mtramuscular, 
intranasal, subcutaneous, percutaneous, intratracheal, intraperitoneal, intratumoral, perfusion, 
lavage, direct injection, and oral administration and formulation. 

Solutions of the active compounds as free base or pharmacologically acceptable salts 
may be prepared in water suitably mixed with a surfactant, such as hydroxypropylcellulose. 
25 Dispersions may also be prepared in glycerol, liquid polyethylene glycols, and mixtures 
thereof and in oils. Under ordinary conditions of storage and use, these preparations contain 
a preservative to prevent the growth of microorganisms. The pharmaceutical forms suitable 
for injectable use include sterile aqueous solutions or dispersions and sterile powders for the 
extemporaneous preparation of sterile injectable solutions or dispersions (U.S. Patent 
30 5,466,468, specifically incorporated herein by reference in its entirety). In all cases the form 
must be sterile and must be fluid to the extent that easy syringability exists. It must be stable 
under the conditions of manufacture and storage and must be preserved against the 
contaminating action of microorganisms, such as bacteria and fungi. The carrier can be a 
solvent or dispersion medium containing, for example, water, ethanol, polyol {e.g., glycerol, 
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propylene glycol, and liquid polyethylene glycol, and the like), suitable mixtures thereof, 
and/or vegetable oils. Proper fluidity may be maintained, for example, by the use of a 
coating, such as lecithin, by the maintenance of the required particle size in the case of 
dispersion and by the use of surfactants. The prevention of the action of microorganisms can 
be brought about by various antibacterial and antifungal agents, for example, parabens, 
chlorobutanol, phenol, sorbic acid, thimerosal, and the like. In many cases, it will be 
preferable to include isotonic agents, for example, sugars or sodium chloride. Prolonged 
absorption of the injectable compositions can be brought about by the use in the compositions 
of agents delaying absorption, for example, aluminum monostearate and gelatin. 

For parenteral administration in an aqueous solution, for example, the solution should 
be suitably buffered if necessary and the liquid diluent first rendered isotonic with sufficient 
saline or glucose. These particular aqueous solutions are especially suitable for intravenous, 
intramuscular, subcutaneous, intratumoral and intraperitoneal administration. In this 
connection, sterile aqueous media that can be employed will be known to those of skill in the 
art in light of the present disclosure. For example, one dosage may be dissolved in 1 ml of 
isotonic NaCl solution and either added to 1000 ml of hypodermoclysis fluid or injected at 
the proposed site of infusion, (see for example, "Remington's Pharmaceutical Sciences" 15th 
Edition, pages 1035-1038 and 1570-1580). Some variation in dosage will necessarily occur 
depending on the condition of the subject being treated. The person responsible for 
administration will, in any event, determine the appropriate dose for the individual subject. 
Moreover, for human administration, preparations should meet sterility, pyrogenicity, general 
safety and purity standards as required by FDA Office of Biologies standards. 

Sterile injectable solutions are prepared by incorporating the active compounds in the 
required amount in the appropriate solvent with various of the other ingredients enumerated 
above, as required, followed by filtered sterilization. Generally, dispersions are prepared by 
incorporating the various sterilized active ingredients into a sterile vehicle which contains the 
basic dispersion medium and the required other ingredients from those enumerated above. In 
the case of sterile powders for the preparation of sterile injectable solutions, the preferred 
methods of preparation are vacuum-drying and freeze-drying techniques which yield a 
powder of the active ingredient plus any additional desired ingredient from a previously 
sterile-filtered solution thereof. 

The compositions disclosed herein may be formulated in a' neutral or salt form. 
Pharmaceutically-acceptable salts, include the acid addition salts (formed with the free amino 
groups of the protein) and which are formed with inorganic acids such as, for example, 
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hydrochloric or phosphoric acids, or such organic acids as acetic, oxalic, tartaric, mandelic, 
and the like. Salts formed with the free carboxyl groups can also be derived from inorganic 
bases such as, for example, sodium, potassium, ammonium, calcium, or ferric hydroxides, 
and such organic bases as isopropylamine, trimethylamine, histidine, procaine and the like. 
5 Upon formulation, solutions will be administered in a manner compatible with the dosage 
formulation and in such amount as is therapeutically effective. The formulations are easily 
administered in a variety of dosage forms such as injectable solutions, drug release capsules 
and the like. 

As used herein, "carrier" includes any and all solvents, dispersion media, vehicles, 
10 coatings, diluents, antibacterial and antifungal agents, isotonic and absorption delaying 
agents, buffers, carrier solutions, suspensions, colloids, and the like. The use of such media 
and agents for pharmaceutical active substances is well known in the art. Except insofar as 
any conventional media or agent is incompatible with the active ingredient, its use in the 
therapeutic compositions is contemplated. Supplementary active ingredients can also be 
15 incorporated into the compositions. 

The phrase "pharmaceutically-acceptable" or "pharmacologically-acceptable" refers 
to molecular entities and compositions that do not produce an allergic or similar untoward 
reaction when administered to a human. The preparation of an aqueous composition that 
contains a protein as an active ingredient is well understood in the art. Typically, such 
20, compositions are prepared as injectables, either as liquid solutions or suspensions; solid 
forms suitable for solution in, or suspension in, liquid prior to injection can also be prepared. 
B. Combination Treatments 

The compounds and methods of die present invention may be used in the context of 
traditional therapies. In order to increase the effectiveness of a treatment with the 

25 compositions of the present invention, it may be desirable to combine these compositions 
with other agents effective in the treatment of those diseases and conditions. For example, 
the treatment of a cancer may be implemented with therapeutic compounds of the present 
invention and other anti-cancer therapies, such as anti-cancer agents or surgery. Likewise, 
the treatment of a vascular disease or condition may involve both the present invention and 

30 conventional vascular agents or therapies. 

Various combinations may be employed; for example, a host cell of the present 
invention is "A" and the secondary anti-cancer agent/therapy is "B": 
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B/B/B/A B/B/A/B A/A/B/B A/B/A/B A/B/B/A B/B/A/A 

5 B/A/B/A B/A/A/B A/A/A/B B/A/A/A A/B/A/A A/A/B/A 

Administration of the therapeutic expression constructs of the present invention to a 
patient will follow general protocols for the administration of that particular secondary 
therapy, taking into account the toxicity, if any, of the treatment. It is expected that the 
10 treatment cycles would be repeated as necessary. It also is contemplated that various 
standard therapies, as well as surgical intervention, may be applied in combination with the 
described therapy. 

V. EXAMPLES 

The following examples are included to demonstrate preferred embodiments of the 
15 invention. It should be appreciated by those of skill in the art that the techniques disclosed in 
the examples which follow represent techniques discovered by the inventor to function well 
in the practice of the invention, and thus can be considered to constitute preferred modes for 
its practice. However, those of skill in the art should, in light of the present disclosure, 
appreciate that many changes can be made in the specific embodiments which are disclosed 
20 and still obtain a like or similar result without departing from the spirit and scope of the 
invention. • 

EXAMPLE 1: 
Materials and Methods 

Cell growth and transfection conditions 

25 NIH3T3 cells, originally obtained from the American Type Culture Collection 

(ATCC, Rockville, Maryland), were grown in DMEM with 10% fetal bovine serum. The D3 
embryonic stem (ES) cell line was obtained from Dr. Tasuku Honjo (Nakano et al, 1994) and 
grown in high glucose DMEM (GIBCO/Invitrogen Corp., #11965-092) supplemented with 2 
mM L-glutamine, 1 mM sodium pyruvate, 0.1 mM nonessential amino acids, 0.15 mM 

30 monothioglycerol (Sigma, M7522), and 15% fetal bovine serum (pre-tested for ES cell 
growth (HyClone Labs, Inc.)). D3 ES cells were routinely grown on a LIF-producing feeder 
layer of mitomycin C-treated (Nagy et al., 2003) SNL cells, obtained from Phil Soriano 



67 



WO 2005/090559 PCT/US2005/008977 

(FHCRC). The SNL cells are G418-resistant. Usually, one passage before flow cytometry, ES 
cell were transferred to gelatin(Sigma)-coated plates without a feeder layer and with LIF 
(ESGRO) added to the medium (1000 units/ml). 

DNA was transfected into D3 ES cells by electroportion essentially as described by 
Nagy et al, (2003). ES cells were suspended in PBS (Ca 2+ and Mg 2+ - free) at 1 x 10 6 cells/ml 
and 0.8 ml of the cell suspension placed in a 0.4-cm-wide electrode-gap sterile cuvette (BIO- 
RAD). Plasmid DNA (20 ug), linearized by overnight digestion with Aft H and Qiagen- 
purified, was added and mixed. Two pulses (instead of one as recommended) of current were 
applied to the cells in the cuvette employing settings of 500mF, and 230V on a BIO-RAD 
Gene-Pulser™ with Capacitance Extender. After 5 min on ice, the viscous solution was 
transferred to a 10-cm culture dish containing mitomycin C-treated SNL cells: After 24 hr, 
G418 selection was begun using 280 ug/ml active G418. Cells were passed after 10-14 days 
onto gelatin-coated plates (no feeder cells) in LIF containing medium with G418. Flow 
cytometry was performed 3-4 days later. 

Aft n-linearized plasmid DNA (10 ug) was introduced into NIH3T3 cells by 
transfection using Superfect reagent (Qiagen) as recommended by the manufacturer. G418 
selection was begun 24 hr after transfection using 400 ^ig/ml G418. Cells were passaged 
twice in G418 before flow cytometry. Regardless of the electroporation into ES cells or 
transfection into the NIH3T3 cells, abundant G4 18 resistant colonies were obtained for each 
cell type. 

Two positive control GFP-expression plasmids were used for both N1H3T3 cells and 
the D3 ES cells to be sure the transfection/electroporation steps were functional and that GFP 
expression occurred in each experiment. These positive controls also helped set the gates for 
analyses of GFP -expressing cells. These two plasmids were the pIRES2-GFP empty plasmid 
(BD Biosciences Clontech) and plRES2-GFP containing an insert encoding the Capn5 gene. 
Both plasmids expressed equally well in each cell type, and the empty pIRES2-GFP vector 
always expressed higher levels of GFP than the one containing the insert. 
Immunoblotting analysis for SHIP proteins 

The techniques for cell extraction, electrophoresis, and immunoblotting have been 
described previously (Liu et al, 2001). Equal amounts of protein extracts from each cell type 
were loaded for gel electrophoresis. SHIP proteins were detected using monoclonal antibody 
P2C6 at a 1 : 1000 dilution ( Lucas and Rohrschneider, 1999). 
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Flow cytometry 

Cells were examined for GFP expression on a Caliber U bench-top analyzer 
Cytometer setting were established using positive FDC-P1 cells expressing GFP from a 
retroviral vector and negative cells, not transfected, or transfected with an empty plasmid At 
least 10 4 cells were analyzed for each plasmid transfected, and two independent transfections 
were examined. Both transfections gave similar results, and the results of one experiment are 
shown. 

Construction of promoter-less GFP-expression constructs for analysis of s-SHIP intron- 
5 promoter activity 

A 7.6-kb DNA Sac l-Sac I fragment from a Lambda 129Sv mouse genomic clone 
(Wolf et at, 2000, NCBI accession #AF235499, hereby incorporated by reference) was used 
for rmual examination of potential tissue-specific promoter activity. This region contained 
almost all of intron-5, the 88 bp of exon-6, and 1271 bp extending into intron-6. This 7 6-kb 
segment was cloned into pBluescript KS (Stratagene), and sub-segments of the region were 
obtamed with the restriction sites shown in FIG. 2. These sub-segments were cloned into a 
promoter-less GFP-expression construct. 

The promoter-less GFP-expression construct was made from the pEGFP-1 p l asmid 
(BD Biosciences Clohtech) by modifications of the MCS (multiple cloning site) 
incorporating additional synthesized cloning sites (EcoRI-AccI(up)-BssHII-NheI-P s tI) for 
insertion of the sub-fragments from the 7.6 kb intron-5 clone. Both AccI and BssHII 
recogmze multiple sequences and the nucleotide sequence in the synthesized DNA 
corresponded to AccI site at nucleotide 2776 of the 7.6-kb region, and the 5' BssHII site of 
the pBluescript plasmid, respectively. In addnion, prior to incorporation of the extended 
MCS, the SV40 early and late introns from pCMVp were inserted at the 3' end of the MCS 
between the Kpnl and Agel sites. Two intron cassettes were used: one containing only the 
sphce acceptor site from the long intron, and a second containing both early and late introns 
The former was used only for inserts {e.g., the 7.6-kb and 4.2-kb inserts) containing an intact 
exon 6 with its splice donor site. The two final plasmids each containing the extended MCS 
and either the late SV40 intron only (pEGFP2-SD3-l), or both SV40 introns (pEGFP2-SDl- 
2), were sequenced through the inserted intron region and one of each with correct sequence 
selected for inserting the 7.6-kb clone and sub-regions. 
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The longest promoter construct contained the entire 7.6-kb putative s-SHIP promoter 
region, and was excised from the pBluescript plasmid with BssHU for insertion into the MCS 
of the pEGFP-SD3-l plasmid. The 6.3-kb fragment was obtained with a partial PstI digestion 
and complete BssHII digestion. The 4.4-kb and 4.2-kb fragments were from derived from 
5 PstI and AccI digestions, respectively. The L9-kb segment was obtained from digestion of 
the 4.4 kb fragment with Nhel. The smallest 0.96 kb region was produced by deleting a 
region of the pBluescript 7.6 kb clone from the Swal site 960 nucleotides 5' of exon 6, to the 
Fbal site 22 nucleotides from the 5' end of the 7.6 kb clone. After ligation, the fragment from 
the 5' BssHII site to the PstI site was excised. Each fragment was inserted into their 
10 respective restriction sites of the extended MCS. Restriction analysis of each purified plasmid 
confirmed the correct insert in the correct orientation, and all cloning junctions were 
sequenced to confirm proper ligation. Each plasmid was linearized with Aflll , and Qiagen 
purified from agarose gels before electroporation or transfection. 

Construction of the 11.5kb- and 6.2kb-GFP s-SHIP promoter transgenes 

I 5 The 11.5kb-GFP transgenic construct was prepared from two separate plasmids 

containing the two halves of the proposed s-SHIP promoter region, plus an 833 nt sequence 
from a lambda genomic clone, which was inserted between these two halves. The genomic 
organization of SfflPl is shown in Wolfe* al (2000). The starting genomic clone contained 
a 4 kb region from the Sad site near the 3' end of the 7.6 kb genomic clone in intron 6, 

20 extending through exon 8 and into intron 8. This SacI-SacI fragment was cloned into the Sad 
site of pBluescript SK (pBSK). The GFP gene from pEGFP-1 (Invitrogen/Clontech) was 
excised with Ncol (encompassing the ATG translation start site of GFP) and Sspl. This was 
ligated into the Ncol (the putative s-SHIP translation start site in exon 7) and EcoRV sites of 
the pBSK-4kb clone. Next, the 5' half of the genomic promoter was added in the form of the 

25 SacI-SacI 7.6 kb genomic sub-clone. This was inserted into the one remaining Sad site at the 
5' end of the intron 6-exon 7-GFP clone in pBSK. This left a gap of 0.9 kb between the two 
Sad sites in intron 6 (see Wolfe* al., 2000). This region was recovered as a larger BsiWI- 
EcoRI 2117 nt fragment, whose sequence demonstrated the insertion of 833 nucleotides 
between two Sad sites. Therefore, this BsiWI-EcoRI fragment was inserted into the same 

30 unique sites of the transgenic construct to produce the finished 11.5kb-GFP transgene in 
pBSK. 

The 6.2kb-GFP transgene-construct was prepared from the 11.5kb-GFP transgene 
prior to the insertion of the 833 nt at the intron 6 SacI site. This 1 1.5kb(A833)-GFP construct 



70 



WO 2005/090559 PCT/US2005/008977 

was digested with Fbal and Swal, removing 5.3 kb from the 5' end of intron 5. Re-hgation 
removed all but 19 intron 5 nt at the 5' end of the 11.5kb-GFP transgene. Both 11.5kb-GFP 
and 6.2kb-GFP transgenes, in pBSK, were cut from the plasmid with BssHH and Qiagen 
purified from an agarose gel for introduction into the mouse genome. 

5 Production of transgenic mice 

Founder transgenic mice were prepared in our Transgenic Mouse Facility by 
pronuclear injection of fertilized zygotes from (C57B1/6 female X CBA/J male) Fl mice. 
Mice, positive for the transgene, were screened by PCR using DNA obtained from tails or 
toes of young animals. The location of the primer set for PCR is shown in FIG. 3: the 

10 upstream primer (a) is within intron 6 (Pro-up2, 5'-TACTCCTCAGCAAGAGTAGCTGG- 
3')(SEQ ED NO:12), and the downstream primer (b) within the GFP gene (GFP-dnl, 5'- 
GCTGAACTTGTGGCCGTTTACGT-3 ')(SEQ ID NO: 13) produce a 632 nucleotide (nt) 
product. These primers were used for detection of both 6.2kb-GFP and 11.5kb-GFP 
transgenic mice. Positive chimeric mice were bred to C57B1/6 mice and four founder lines 

15 (A, B, C and D) obtained for the 1 1 .5kb-GFP mice. Later analyses demonstrated that founder 
line B was not positive for GFP expression, even though the primer pair a and b gave a 
positive 632 nt product. Therefore, line B is not included in further analyses. The other lines 
were maintained by breeding transgene-positive animals with wild-type C57BI/6 mice. For 
some experiments transgene-positive offspring were generated from positive intra-line 

20 breeding. Two founder animals were obtained for the 6.2kb-GFP transgene but one was lost. 

The transgene copy number in each founder line (except 11.5kb-GEP, line B) was 
determined by semi-quantitative RT-PCR of transgene expression relative to endogenous 
Gab2 expression. Primers for detecting genomic gab2 are: E4F, 5'- 
CTTCTATAGCCTTCCCAAGCC-3 ' (SEQ ED NO: 14); E5R, 5'- 
25 CTCGTAGGTCTCACAGGAAG-3 ' (SEQ ED NO: 1 5). 

Analysis of embryos 

Preimplantation embryos were harvested at 2.5 and 3.5 dpc from uterine horns of 
pregnant females [see Nagy et al, (2003) for details of these methods]. The morulae and 
blastocysts were washed in RPMI 1640 medium (Gibco) containing 10% fetal bovine serum, 
30 transferred to PBS (Ca 2+ and Mg 2+ ), and GFP-expression or phase images photographed on a 
Nikon Eclipse TE200 inverted microscope coupled to a Roper Scientific lkxlk pixel digital 
camera. Images were captured with MetaMorph software and prepared for publication with 
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Photoshop (Adobe). High-resolution z-sections of GFP expression within embryos were 
made with a Leica TCS SP Confocal microscope. 

Several blastocysts were plated onto gelatin-coated tissue-culture wells in DME 10% 
fetal bovine serum, and photographed three days later. During this period, blastocysts hatched 
from the zona pellucida, and attached to the culture plate. The attached mass of 
trophectoderm cells with the non-adherent ICM was photographed for GFP and phase with a 
Nikon Eclipse TE200 microscope. 

RT-PCR analysis of s-SHIP expression in blastocysts 

mRNA was isolated from wild-type 3.5 dpc blastocysts, FDC-P1 cells and the D3 ES 
cells using a Dynabeads mRNA DIRECT micro kit (Dynal). Reverse transcription used the 
Sensiscript kit from Qiagen, and the PCR cycling conditions were as follows: 94°C 1 min, 
[94°C 15 sec, 68°C 2 min] x 30 cycles, 68°C 5 min, and a 4°C hold. Each reaction used the 
equivalent of 1.5 ng mRNA, based on the concentration before reverse transcription. Primers 
pairs were: 

HPRT-upl, 5 '-CCTGCTGGATTACATTAAAGCACTG-3 ' (SEQ ID NO.16), 
HPRT-downl 5 '-GTCAAGGGCATATCCAACAACAAAC-3 ' (SEQ ID NO:17); 

OCT4-Upl 5 ' -GGCGTTCTCTTTGGAAAGGTGTTC-3 ' (SEQ ID NO: 18), 
OCT4-Downl 5'-CTCGAACCACATCCTTCTCT-3' (SEQ ID NO:19); 

SHIPl/s-SHIPpair#3, 

SHIP-E8FW, 5 '-TTGCTGCACGAGGGCTCAGAATC-3 ' (SEQ ID NO:20), 
SSP883RV, 5 '-TCCGATTCTCATGCTCTGGCTTG-3 ' (SEQ ID NO:21); 

SfflPl/s-SHIP pair #4, 

SP2109FW, 5 '-CAGCCCTGTCTTTGCCACGTTTG-3 ' (SEQ ID NO:22), 
SP2637RV, 5'-TCCACTGGATTCATCCCGCTCTG-3' (SEQ ID NO:23); 

SHIPl/s-SHIPpair#5, 

newfw, 5'-CTTCCTCTTGCAACAGAGAACCC-3' (SEQ ID NO:24), 
newrv, 5 ' -ACTC AACGTCC ACTTTGAGATGC-3 ' (SEQ ID NO:25). 
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EXAMPLE 2: 

Identification and Ch aracterization of the s-SHIP Promoter 

Potential s-SHIP promoter activity was first analyzed in cell lines grown in culture. 
Several cell lines were tested for s-SHIP vs. SHIP1 protein expression, based on the known 
and expected expression pattern of the s-SHIP protein (Lioubin et al, 1994; Tu et al, 2001). 
These results showed the expression of the ~104-kDa s-SHIP only in the ES cells, whereas 
the 145-kDa SHIP1 product was exclusively expressed in the maturing FD-Fms myeloid 
cells. Hot SDS-extraction of the ES cells did not change the size of the s-SHIP protein, 
suggesting that this 104-kDa product is not the result of proteolytic degradation during 
extraction (Horn et al, 2001). SHIP proteins were not detectable in NIH3T3 fibroblasts, the 
SNL cells serving as feeder for the ES cell growth, or the 293 human kidney cells. Therefore, 
NIH3T3 cells and D3 ES cells were selected as negative and positive cells, respectively, for 
analysis of the potential s-SHIP promoter activity. 

A 7.6-kb genomic ship! region containing the intron-5 region was obtained for initial 
promoter analysis. The entire 7.6-kb region and sub-fragments thereof were cloned into a 
promoter-less GFP (enhanced green-fluorescent protein) expression vector (FIG. 1). 
Promoter activity of the intron-5 region was then assayed in the cells positive for s-SHIP 
expression (embryonic stem cells, clone D3) vs. cells negative for s-SHIP expression 
(MH3T3 cells). The expression of GFP in each cell type, assayed by flow cytometry, was a 
measure of the promoter activity within each fragment of the 7.6 kb genomic DNA. The 
results indicated that, whereas, empty vectors alone lacked significant promoter activity in 
either cell type, vectors containing intron-5 segments exhibited substantial expression in the 
D3 ES cells but not in the NIH3T3 cells. Segments of intron 5, ranging from 0.96 kb to 7.6 
kb were active for GFP expression in the ES cells; however, the shorter segments appeared 
most active. Two fragments of 1.9 kb and 0.96 kb, immediately upstream of exon 6, each 
exhibited equally high GFP expression. The shortest insert fragment contained part of exon 6, 
but only the 44 nucleotides upstream of exon 6, (Tu et al, 2001), and was completely without 
promoter activity. These results strongly suggest that the intron-5 region of genomic skip! 
contains cell-specific promoter activity, and segments more distal to exon 6 may have 
negative regulatory activity. 

Based on the ES/MH3T3 cell-transfection experiments, two new constructs with an 
extended region downstream of the intron-5 genomic area were prepared for in vivo analysis 
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ofpromoter activity in transgenic mice (FIG. 3A). Transgenic mice were produced for in 
vivo examination of the putative s-SHIP promoter/enhancer activity, and determining the 
overall expression pattern of the transgene, and presumably s-SHIP protein. The promoter in 
the longer of the new constructs (the 11.5kb-GFP transgene) contained the entire intron 5 
from the above 7.6-kb genomic fragment, plus all of exon 6, intron 6, and the portion of exon 
7 ending at the theoretical ATG start site (Kozak, 1987) for the s-SHDP protein translation. 
This start site was fused, in frame, to the ATG for the GFP protein. All of intron 6 and part of 
exon 7 were included in this construct because, 1) the construct might then more closely 
resemble the endogenous promoter, 2) splicing may be important for efficient expression 
(Nott et al, 2004), and 3) positive or negative regulatory elements for expression may also 
reside within this sequence. The second, shorter, transgenic promoter construct (the 6.2kb- 
GFP transgene) was similar, but contained only 0.96 kb of intron 5 sequence adjacent to exon 
6, and also lacked 833 nucleotides between two Sad sites within intron 6. Thus, if either 
construct contained promoter activity in vivo, transcription would start within intron 5, while 
intron 6 would be spliced out and translation of GFP would begin at the first ATG within an 
appropriate Kozak site. 

Transgenic (Tg) mice were then produced in the Hutchinson Center transgenic 
Mouse facility and chimera animals screened for each transgene by PCR. Breeding each 
founder to wild-type C57B1/6 mice yielded four lines containing the 11.5kb-GFP transgene, 
and one line with the 6.2kb-GFP transgene. Of the four founder Tgll.5 kb-GFP mice, one 
was negative for expression of the transgene (line B), while three were positive and each has 
exhibited the same expression patterns (lines A, C and D). Copy numbers of genomic 
transgenes, measured relative to the endogenous gab2 gene are shown in FIG. 3B. Within the 
three GFP-expressing 11.5kb-GFP founder mice, empirical results indicate that line C 
exhibits the noticeably highest GFP expression levels. Line C mice also exhibit lower birth 
rates with in utero death at 8.5-9.5 days postcoitum (dpc) apparent. The single 6.2kb-GFP 
founder line harbors the most transgene copies, but no overt defects in the physical 
appearance of these mice, their birth rate or development have been observed. 

Experiments were then conducted with the adult transgenic 11.5kb-GFP mice to 
examine transgene expression; however, it was difficult initially to find any GFP expressed in 
these mice by flow cytometry of blood and stem cell enriched bone marrow. After several 
negative attempts to find GFP expression, it was reasoned that because ES cell expression 
was readily detectable in the initial ES cell experiments, the best test for in vivo expression 
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would be the inner cell mass (ICM) of the blastocyst, from which ES cells can be derived 
Therefore, we looked for GFP expression in 3.5-dpc blastocysts derived from mating of Tg 
males x WT females. Blastocysts derived from one such cross produced 9 GFP-positive 
embryos indicating that the Tg was homozygous for the transgene. A separate Tg male bred 
to a WT female produced both positive and negative blastocysts. GFP-positive morulae were 
also obtamed from similar crosses; whereas, blastocysts or morulae from WT parents were 
negative for GFP. 

Blastocysts are composed of 2-3 cell types depending on their developmental stage 
The outer trophectoderm layer of cells surrounds the eccentric inner cell mass (ICM) 
destmed to become the embryo proper, and later stage blastocysts also contain endodermai 
cells separating the ICM from the blastocoel cavity (Nagy et at., 2003). To obtain a better 
idea of which cells of the blastocyst express the GFP transgene, transgenic 3 5-dpc 
blastocysts were allowed to adhere to a culture dish by three days growth in DME 10% FBS 
Under these conditions, the zona pellucida is shed, and the outer trophectoderm cells of the 
blastocyst form an adherent layer while the ICM remains as an unspread mass, and each is 
dxsnngmshable morphologically from the other. The results showed that the ICM portion of 
the blastocyst retained the GFP expression while the adherent trophectodenn cells- were 
largely GFP-negative. 

A more detailed picture of GFP expression throughout the intact early pre- 
implantation embryos was seen in confocal Z-sections of GFP within transgenic 2 5-dpc 
morulae and 3.5-dpc blastocysts. All cells of the 16 to 32-cell morula were GFP-positive 
Transition of the morula to the early blastocyst is marked by the formation of the blastocoel 
cavny. A few cells of this early blastocyst structure began to shut-off GFP expression and 
the extent of this GFP shut-off was more evident in the late blastocyst. Here, the outer 
trophectoderm cells had noticeably lower GFP expression, and the GFP-positive cells were 
confined to the ICM Endodermal cells were not readily apparent. In these images it is 
helpful to remember that the half-life of the GFP fluorescence is greater than 24 hr (Tech 
Borchure, BD Bioscience ClonTech), and therefore cells, which have stopped expressing 
GFP, win retain some GFP protein and fluorescence for several days. Twenty-four hours 
separates the morula from the blastocyst stages; therefore, transgene shut-off early during this 
time would result in lower but not complete lack of GFP fluorescence late in this time span 
The 11.5-kb transgene s-SHIP promoter contains the information for both cell-specific 
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positive expression in morula and ICM of the blastocyst, but also cell-specific shut-off in 
trophectoderm cells. 

Preimplantation embryos from the Tg6.2kb-GFP mice were analyzed next. The 
transgene in these mice contained only the proximal 0.96-kb region upstream of exon 6, 
which was necessary for GFP expression in the ES cells. It also lacked 833 nucleotides 
between two Sad sites of the intron-6 region. GFP expression in the 3.5-dpc blastocyst of the 
6.2kb-GFP line was analyzed. Both qualitative and quantitative features of GFP expression in 
the Tg6.2kb-GFP blastocysts differed from those in the Tgll.5kb-GFP mice. First, GFP 
expression in the Tg6.2kb-GFP blastocysts was noticeably stronger (at least 5-fold) than that 
in the Tgll.5kb-GFP blastocysts, as measured by exposure times for obtaining equivalent 
GFP images in the Nikon digital microscope. Second, and more noticeable was the lack of 
GFP shut-off in the trophectoderm cells of the blastocyst. No clear demarcation in GFP 
expression was evident between ICM vs. trophectoderm as seen in the Tgll.5kb-GFP 
blastocysts. 

Blastocysts from the Tg6.2kb-GFP mice were also allowed to adhere to culture plates 
and GFP expression was examined. Adherent blastocysts from Tgll.5kb-GFP mice were 
examined simultaneously. Adherent Tg6.2kb-GFP blastocyste expressed GFP in both ICM 
and trophectoderm cells in a, frequently, haphazard pattern. The Tgll.5kb-GFP adherent 
blastocysts expressed GFP only in the ICM as observed previously. A comparison of all 
embryos examined revealed that an increased GFP expression was apparent within the 
adherent Tg6.2kb-GFP blastocysts relative to the adherent Tgll.5kb-GFP blastocysts. These 
results were consistent with the promoter analyses performed in the ES cells (FIG. 1), and 
suggested that the lack of GFP shut-off by the 6.2kb-GFP transgene was due to negative 
regulatory information found in either one or both regions of the 11.5kb-GFP construct 
missing from the 6.2kb-GFP transgene. 

The data from Tu et al. (2001) and that presented herein demonstrated exclusive s- 
SfflP (rather than SBDP1) expression in ES cells, yet, even though ES cells are derived from 
the ICM of the blastocyst and the intron 5 s-SHIP promoter functioned well in the ICM, it 
was still not certain whether the ICM actually expressed s-SHIP in vivo. Consequently, s- 
SHIP mRNA expression was then analyzed by RT-PCR, compared to that of the universally 
expressed HPRT, and the ES cell and ICM-specific Oct4 transcription factor. RNA from 
blastocysts, FDC-P1 myeloid progenitor cells, and D3 ES cells, was positive for HPRT as 
expected, and only the blastocysts and ES cells were positive for Oct4. Initially, the s-SHIP- 
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specific primers similar to those described by Tu et al. (2001) was used to test for s-SHDP 
expression; however, poor results were obtained. The forward primer in this set was moved 
3 '-ward into the region identical to SHIP1 but weak detection was still obtained. s-SHTP was 
therefore detected by "subtraction" using primers detecting both s-SHIP and SHIP1 products 
5 vs. primers detecting only the SHDDP1 product. These primers clearly demonstrated the 
presence of full-length SHIP1 only in the FDC-P1 cells, and s-SHIP in both blastocysts and 
ES cells. The weak detectability of s-SHIP may be due to poor hybridization of the primers, 
degradation of the 5' s-SHIP mRNA ends, or possibly an additional shorter transcription 
product from the ship] gene. 

10 Examination of the minimal 0.96-kb promoter proximal to exon 6 by Matlnspector 

indicated at several transcription-factor binding site potentially active in ES cells and the 
blastocyst ICM. FIG. 4 shows the first 600 nucleotides of this region upstream of exon 6, 
with potential transcription factor binding sites and motifs for transcriptional regulation 
marked. A transcription initiator sequence (Butler and Kadonaga, 2002) straddles the 5' end 

15 of the 44 nt SSR, suggesting a transcriptional start site. Paired GAT A, or Lmo2 binding sites 
are present, two overlapping p53 and Oct-binding sites, and a single extended FOX-factor 
binding region are prominent motifs. The Oct-binding motif is present in similar regions of 
both the murine and human s-SHIP promoter, suggesting such a factor could be important for 
ES and ICM expression. The POU factor Qct4 is expressed in ES cells and is part of an 

20 enhancer for ES cell-specific expression of target genes (Dailey et al., 1994). Therefore the 
Oct site could be part of a similar ES cell enhancer region. 

The transgene expression in preimplantation embryos raises a question about possible 
progenitor transgene expression in the oocytes or sperm of the adult, which then give rise to 
the fertilized embryo. The transcription factor, Oct4, is expressed in adult and embryonic 

25 germ cells, as well as the blastocyst ICM and in ES cells (Pesce et al, 1998). The possibility 
that the 11.5kb-GFP transgene could also be germ cell specific is even more likely given the 
prominent Oct4 binding motif within the 0.96 kb minimal promoter upstream of exon 6 (see 
FIG. 4). Therefore, ovaries and testes from 7-8 week old adult Tgll.5kb-GFP mice were 
harvested and frozen sections stained with Alexa 594-labeled phalloidin for visualizing tissue 

30 structure through polymerized actin staining, and endogenous GFP expression. The results of 
this experiment demonstrated that neither the developing sperm of the testis, nor the 
developing oocytes of the ovarian follicles expressed GFP. Only blood vessels of the testes 
and ovaries exhibited specific GFP expression. Therefore, unlike the Oct4 transcription 
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factor, the 1 1 .5kb-GFP transgene is not a maternally activated gene, must be transcriptionally 
activated sometime after the germ cells leave the ovary/testis, and before the 2.5-dpc-morula 
stage of development. 

EXAMPLE 3: 

5 Further Characterization of the s-SHIP Promoter 

The transgenic mice that were generated from the experiments described in Example 
2 were further analyzed by immunofluorescence. Embryos were harvested, washed, and fixed 
2-4 hr in 2% paraformaldehyde in PBS, then washed in 30% sucrose in PBS and stored in 
that solution overnight at 4° C. Embryos were frozen in O.C.T. on dry ice and stored at -80° 

10 C until sectioned. Twenty mm sections on Superfrost/Plus (Fisherbrand) microscope glass 
slides were air-dried overnight and stored, desiccated, at -20 C. Sections were routinely 
stained with a rabbit anti-GFP antibody coupled to Alexa 488 (Molecular Probes) to enhance 
the transgenic GFP detection. For general screening, sections were also stained with 
phalloidin coupled to Alexa 594 (molecular Probes) for detecting morphology by filamentous 

15 actin staining. Other antibodies used were specific for: CD45-Cy-Chrome labeled, and Flkl 
(VEGFR2) phycoerythrin labeled (both from Phaxmingen); Oct4, mouse monoclonal (Santa 
Cruz Biotechnology); E-cadherin, rat monoclonal (Zymed); and alpha smooth muscle actin, 
mouse monoclonal (Sigma). The BM alkaline phosphatase detection reagent was from 
Roche. The tissue sections were blocked in 5% fetal bovine serum for 30 min. washed in 

20 PBS, treated 10 rnin in 0.5% TX-100 in PBS then washed again in PBS. The primary 
antibodies were applied and sections incubated at RT in a humidified chamber for 45-60 min. 
Sections were washed 3-times 10 min each in PBS and secondary antibodies added and 
incubated as before. Final washing was 3-times 15 min and sections were mounted in 
ProLong (Molecular Probes). All secondary antibodies were from Molecular probes and 

25 labeled with Alexa 594 or Alexa 633. Slides were viewed with a Leitz TCS SP Confocal 
microscope. 

s-SHIP expression was tested in blastocysts by RT-PCR, compared to that of the 
universally expressed HPRT, and the ES cell and ICM-specific Oct4 transcription factor 
(Pesce et al., 1998). FDC-P1 myeloid progenitor cells express SHIP1 but not s-SHIP (Lucas 
30 and Rohrschneider, 1999; Tu et al. 2001), while conversely, D3 ES cells express only s-SHIP 
(Tu et al. 2001; data not shown) . These two cell types represented the positive and negative 
controls. RNA from FDC-P1, D3 ES cells and 3.5 dpc blastocysts were tested and each was 
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positive for HPRT, while rrtRNA from the blastocysts and ES cells, but not from the FDC-P1 
cells, was positive for Oct4, as expected. s-SHIP was detected by "subtraction" using primers 
common to both s-SHIP and SHIP1 vs. primers detecting only the SHIP1 product. These 
primers demonstrate the presence of full-length SHIP1 only in the FDC-P1 cells, and s-SHIP 
in both blastocysts and ES cells. Analysis of transgene expression in the post-implantation 
mouse embryo. 

Ongoing development of the blastocyst following implantation of the embryo 
continues with the formation of the epiblast (the embryo body) from cells comprising the 
blastocyst ICM. Consistent with this derivation, the epiblast in the Tgl 1 .5kb-GFP E6 embryo 
retained GFP expression; however, within 24 hr, epiblast GFP expression was lost. E7-7.5 
embryos exhibited individual GFP-positive cells, or groups of cells, in the extraembryonic 
membranes, often appearing to trail from the epiblast. By 8.5 dpc, GFP expression could no 
longer be seen in the embryo body itself, but both the yolk sac and placenta contained 
numerous GFP-positive cells. 

Serial sections through the transgenic E8.5 decidua and growing embryo have not 
detected specific GFP expression in any tissue or cells of the embryo body. Specifically, 
GFP-positive migrating primordial germ cells (PGCs) have not been detected in the hindgut 
region of the E8.5 embryos (not shown, but see later results). Within the extraembryonic 
regions, however, the ectoplacental plate (also called chorioallantoic plate) contained patches 
of GFP-positive cells, and both the blood islands and endoderm cell layer of the yolk sac 
contained individual or small groups of GFP-positive cells. Occasional intense GFP round 
cells within the primitive erythrocyte-filled blood islands were observed, and groups of GFP* 
cells adjacent to blood islands were also seen. GFP cells of the peripheral ectoplacental plate 
sometimes appeared contiguous with GFP endodermal cells of the yolk sac . Maternal 
placental contributions did not account for the yolk sac nor ectoplacental plate expression 
profiles, because the same GFP expression patterns were observed in embryos from Tg males 
mated to WT females (not shown). This GFP expression pattern in the yolk sac is similar to 
that reported using an enhancer derived from the Scl(Tall) stem cell protein (Sanchez et al, 
1999), suggesting that these GFP cells may be related to hemangioblasts. However, 
Scl(Tall) expression was not detected in the E8.5 GFP extraembryonic membrane cells. 

In contrast to the lack of GFP expression within the 8.5-dpc embryo proper, whole- 
mount observations of 11.5-dpc Tgll.5kb-GFP embryo showed dramatic GFP expression in 
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the caudal region of the embryos and a distinct pattern of GFP expression on, and around, 
each hindlimb/forehmb pair. This complex pattern represented multiple distinct expression 
sites. At this age of development, the strongest and most broadly observed site of GFP 
expression was the epidermal cell layer of the developing skin. This was seen in whole 
mounts but demonstrated best in frozen sections of El 1.5 transgenic embryos. The El 1.5 
transgenic embryos exhibited extensive GFP expression in the body and limbs, but such 
expression was not seen in the head region at this embryonic stage. A second epidermis- 
related GFP expression site was the apical ectodermal ridge (AER) of each developing limb, 
and a third distinct pattern was the mammary buds. These latter structures form by 
invaginations of the skin epidermis and were observed as five GFP-positive spots between 
each forelimb/liindlimb pair, corresponding to the bilateral three thoracic and two inguinal 
developing mammary glands. These were seen in whole mount embryos or underneath the 
dissected epidermis. 

An additional prominent GFP expression site in 11.5-dpc Tgll.5kb-GFP embryos 
was the genital ridge where PGCs were accumulating. The dissected aorta-gonad- 
mesonephros (AGM) region from an E13.5 embryo demonstrated the localization of the GFP + 
cells within the gonads. The PGCs co-express the nuclear transcription factor Oct4 within the 
GFP cells, identifying these GFP* cells as PGCs. GFP expression was not detected above 
background in the liver and doral aorta. The dorsal aorta is also present in the dissected AGM 
between the two gonads, but specific GFP expression was not observable at this site. The 
absence of GFP expression in the dorsal aorta suggests a lack of any relationship to definitive 
hematopoiesis, reportedly arising from this region (Dzierzak, Medvinsky ,and de Bruijn, 
1998). 

Primordial germ cells 

GFP expression in the PGCs of the Tgll.5kb-GFP embryo was examined in more 
detail during E9.5 - El 8.5 stages of embryonic development. Using Oct4 as a marker for 
PGCs, the temporal co-expression of GFP and Oct4 was followed from embryonic day 9.5 to 
18.5. Consistent with earlier results, GFP was not expressed in the earliest E9.5 PGCs 
migrating along the hindgut, although Oct4 was present in their nuclei (Fig, 5Aa,b,c). At 
El 1.5, PGCs of the genital ridges contained GFP and nuclear Oct4. Both E13.5 and E17.5 
PGCs were GFP + and Oct4 + . Together with the earlier analyses of the E8.5 embryos, these 
results suggest that GFP is not expressed in the early migrating PGCs of the E8.5-9.5 embryo, 
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but is readily detected in the PGCs of the genital ridge and gonads of the El 1.5-17.5 
embryos. 

GFP expression was observed in PGCs from an E13.5 embryo ovary. Many of the 

+ 

GFP PGCs cells at this stage were in cell division. Primitive seminiferous tubules are 
distinguishable in the El 5.5 testes, populated with GFP + PGCs and developing spermatogenic 
cells (Kaufman, 2001). The brightest GFP* cells at this stage were in mitosis and these may 
represent the type A spermatogonia undergoing cell division (Kaufman, 2001). The 
seminiferous tubules of the El 8.5 embryo were, likewise, filled with GFP + spermatogenic 
cells. Sertoli cells attached to the basement membrane of the seminiferous tubule were GFP- 
negative. Germ cells in both ovaries and testes were positive for GFP in the E13.5-E18.5 
stages of embryo development. However, quite surprisingly, neither ovaries nor testes of 
adults expressed GFP in any stage of germ cell formation. These observations indicate that 
the expression of the 11.5kb-GFP transgene is positively regulated during the El 1.5-18.5 
developmental stages, but negatively regulated in the adult. 

11.5kb-GFP Wansgene expression in other tissues from E15.5-E18.5 embryos 

Following- El 1.5 in the Tgll.5kb-GFP embryo, new GFP + structures are observed.. 
Specifically, the developing cornea of the 15.5-dpc embryo eye was GFP + . The cornea is 
formed from an outer epithelial layer and an inner cell layer derived from neural crest cells. 
Only the outer epithelial layer showed GFP expression. The retina, derived from 
neuroepithelium, did not express GFP, while the lens, although an invagination of the 
epidermis, was likewise GFP-negative. 

El 5.5 embryos began to exhibit GFP expression in cells surrounding blood vessels, 
and this expression became more noticeable at El 8.5. The expression was restricted to 
smaller vessels and not expressed in larger arteries {e.g., aorta) or veins. The vessel- 
associated GFP was expressed in cells wrapping around the circumference of the vessels. 
This characteristic suggests the GFP + cells are smooth muscle cells, and this notion was 
supported by the observation that alpha-smooth muscle actin co-localized with the GFP 
vessel cells. GFP was not detectable in all vascular smooth muscle cells (vSMCs), and was 
also not detected in cardiac or skeletal muscle smooth muscle cells. This indicates a highly 
specific regulation of the 11.5kb-GFP transgene only in a population of the smooth muscle 
cells associated with small blood vessels. 
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A few scattered cells in the E15.5 thymus were also positive for GFP. 
Morphologically, these cells did not appear to be blood derived, but rather adhered to the 
thymus stroma via E-cadherin, suggesting an epithelial nature. 

The paired vomeronasal (Jacobson's) organ in the nasal septum also expressed GFP. 
Jacobson's organ is derived from the neuroepithelium, again suggesting a potential 
connection between transgene expression and epidermal-derived tissues. 

The E17.5-18.5 transgenic embryos exhibited GFP expression in cells associated with 
the forming bone matrix. These small irregularly shaped GFP + cells are attached to the newly 
formed bone and are most likely the osteoblasts in the process of converting extracellular 
matrix material to bone (Komori, et al., 1997; Long, et al., 2004). This notion was confirmed 
by demonstrating the colocalization of alkaline phosphatase and GFP in these cells. 
Trabecular bone also exhibited alkaline phosphatase-positive/GFP + osteoblasts. 

Transgene expression in embryonic epidermis, tissues derived from epidermis, and 
other epithelial cells 

Embryonic skin is a major stem/progenitor cell population essential for formation of 
several appendages and tissues, such as, hair follicles, sweat glands, mammary tissue, and 
(more indirectly) prostate. Therefore, these epidermal-derived tissues were examined for GFP 
expression in Tgl 1 .5kb-GFP embryonic mice. 

Hair follicle formation initiates around E13.5 by reciprocal inductive signals between 
skin epidermis and underlying mesenchyme (Miller, 2002). The resultant hair follicle 
placodes exhibit a localized epidermal thickening in a geometrically ordered array due to 
inhibitory signaling from each placode (Andl et al., 2002). The results demonstrated that 
E13.5 transgenic embryos exhibit a geometric array of GFP + speckles in the skin, suggesting 
GFP^expression in the hair follicle placodes. Frozen sections of skin at this stage revealed 
GFP + epidermal thickening indicating placode formation. Growth and extension of the 
placodes into the mesenchyme produces the hair follicle sheath with a distal bulb (Alanso and 
Fuchs, 2003). Early stages of hair follicle formation showed the incorporation of GFP + skin 
epidermal cells into the follicle extending about one cell diameter into the mesenchyme. The 
dermal component of the follicle (the dermal papilla) is GFP", and clearly visible in figure 7C 
adjacent to the GFP + cells. As the follicle extends further, the GFP + cells remain as a small 
cluster, perhaps maintaining their epidermal-cell niche. During further growth of the hair 
follicle, the GFP + cell cluster remained intact at the distal bulb end of the growing follicle, 
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and the follicle retained the E-cadherin expression, as did the epidermis from which it was 
derived. Frequently, the base of each growing hair follicle (anchored in the epidermis) lacked 
significant GFP expression, and the GKP + cluster resides at the distal bulb end of the follicle. 

Mammary bud formation also occurs from skin epidermis by epidermis-mesenchyme 
5 induction, but differs in the time of initial formation, the size of placodes, and morphology of 
the developing tissue. The mammary buds were visible in whole mount immunofluorescence 
of the El 1.5 transgenic embryos, and frozen sections taken at the same stage show GFP + 
mammary placodes. These placodes became round-up and extended into the mesenchyme 
forming the buds, which subsequently formed larger bulbous structures as they grew. The 
10 results also demonstrated the size of a hair follicle relative to a mammary bud at this stage. 
This growing embryonic mammary tissue exhibited a few GFP + cells at the outside periphery 
of the tissue. The expanding mammary tissue remained attached to the epidermis and the 
nipple formed around this attachment site. 

A third tissue derived from epithelium is the prostate organ. However, unlike hair 
15 follicles and mammary buds, prostate is formed from urothelium by poorly understood 
signals at the base of the bladder in the El 7.5 male mouse. Frozen sections of the prostatic 
region of the El 8.5 Tgll.5kb-GEP male mouse demonstrated specific GFP expression in 
regions destined to become lateral lobes of the prostate. Serial sections through this region 
showed GFP expression only in the prostatic region, and non-transgenic mice lacked this 
20 expression. Overall, these results indicated that epidermal/epithelial tissue constitutes a major 
target for expression of the 11.5kb-GFP transgene, and tissues derived from epithelium 
appear to obtain (and maintain) stem cells from these sources. 

Expression of GFP from the shorter 6.2kb-GFP transgene in the mouse embryo 

The 6.2kb-GFP transgene mice were produced for the purpose of obtaining 
25 information about expression capability of promoter segments compared to the 11.5kb-GFP 
transgene mice. In pre-implantation embryos the shorter transgene was strongly expressed 
throughout the blastocyst and no delineation was apparent between the ICM and 
trophectoderm cells in the intact blastocyst, or in blastocysts allowed to adhere to tissue 
culture plastic. This lack of ICM specificity could be due to the higher copy number in the 
30 single Tg6.2kb-GFP founder, and/or to the higher expression of the transgene. Regardless, 
later developmental stages exhibited highly tissue-specific expression patterns, and dramatic 
differences in the 6.2kb-GFP transgene expression compared to the 11.5kb-GFP transgene 



83 



WO 2005/090559 



PCT/US2005/008977 



were observed. The qualitative differences were not yet apparent in E8.5 embryos; however, 
at this stage the Tg 6.2kb-GFP E8.5 embryos no longer expressed GFP ubiquitously but only 
cells of the extraembryonic membranes were GFP + . This expression was similar to that 
observed with the same stage Tgll.5kb-GFP embryos; however, yolk sac expression was not 
observed in the Tg6.2kb-GFP mice. Surprisingly, unlike the Tgll.5kb-GFP mice, the El 1.5- 
13.5 Tg6.2kb-GFP embryos were devoid of significant GFP expression. Thus, at these 
developmental stages two of the most prominent GFP expression sites in the Tgll.5kbFP 
embryos (i.e., skin epidermis and PGCs in the genital ridge and gonads) were completely 
absent in the Tg6.2kb-GFP embryos. 

At still later developmental times, E18.5, two of the GFP expression sites seen in the 
Tgll.5kb-GFP embryos were observed in the Tg6.2kb-GFP embryos. These sites were the 
blood vessels and the osteoblast cells attached to the forming bone matrix. GFP was 
expressed in the smaller blood vessels, but not all small vessels expressed GFP. 

Therefore, the expression of the 6.2kb-GFP transgene within the embryo was limited 
to fewer tissues than observed with the Tgll.5kb-GFP mice, and represented a subset of 
expression sites seen within the Tgl 1.5kb-GFP mice at the same developmental stage. These 
results suggest that the promoter sequence remaining within the 6.2kb-GFP transgene 
contains the information (enhancers) for tissue-specific expression of GFP in cells of the 
extraembryonic membranes at E8.5 and in the smooth muscle cells surrounding blood 
vessels, and osteoblast in late stage embryos. Conversely, the genetic sequences present in the 
11.5kb-GFP transgene, but absent from the 6.2kb-GFP transgene, contain important 
instructions (perhaps in conjunction with the shorter promoter sequence) for tissue-specific 
expression in PGCs, skin epidermal cells as well as tissues derived from skin epidermis. 

The results presented here demonstrate that the intron-5/6 region of the ship! gene 
contains the promoter/enhancer for tissue-specific expression in primitive embryonic cell 
populations. GFP expression by the 11.5kb-GFP construct was first observed in cultured ES 
cells, then in all blastomeres of the morula, and in the ICM of the blastocyst from Tgll.5kb- 
GFP mice. Thus the initial embryo expression occurred uniformly in the totipotent cells of 
the preimplantation embryo. Following implantation, the initially GFP + epiblast lost GFP 
expression; however, at E7.5 a few cells in the extraembryonic membranes and placenta were 
GFP . Observations at different times indicated these cells originated near the epiblast, and 
probably gave rise to the few GFP + yolk sac cells seen in the blood islands beginning at E8.5. 
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The embryo proper lacked GFP expression from E7.5 to about El 0.5, but strong GFP 
expression was observable around El 1.5 in skin epidermis, mammary buds, developing 
gonads, limb AER region, and a few days later, the developing hair follicles. From E15.5- 
18.5 the AER GFP label (and structure) vanished, but skin epidermal cells retained GFP 
expression. During this time, however, GFP + cells of the skin appendages were retained in a 
small cluster (hair follicles) or a few peripheral epithelial cells (mammary tissue). Also, 
vSMCs, osteoblasts of developing bone, the vomeronasal organ, prostate, and a few cells of 
the thymus were GFP + at El 5.5-1 8.5. 

Several significant points can be made about the embryonic expression pattern of the 
11.5kb-GFP transgene. First, a strong preference exists for stem/progenitor cells (ES cells, 
morula, ICM, primordial germ cells, epidermis), but also for several cell types of yet 
undefined character and potential (extraembryonic cells, yolk sac cells, thymus, vemeronasal 
cells, vSMCs, osteoblasts). Second, no clear continuum of GFP-expressing cells is observed 
throughout embryo development; rather, it is likely that transgene expression is turned on and 
off at various stages and locations during development. Third, as observed in the Tgll.5kb- 
GFP vs. the Tg6.2kb-GFP mice, distinct portions of the intron5/6 promoter/enhancer are 
essential for tissue-specific expression. Finally, transgene GFP expression was never 
observed in more mature cells of either embryonic or adult tissues, and many of the GFP + 
stem/progenitor cells in the embryo are also retained in the adult tissues (ms in preparation). 
These results indicate that s-SHIP expression is spatially and temporally regulated throughout 
development (see supplementary Table 5). 
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TABLE 5 

Summary of temporal and spatial GFP expression in the 
Tgl 1.5kb-GFP and Tg6.2kb-GFP mouse embryos. 



Embryo Age 


Expression Site 


Tgll.5kb-GFP 


Tg6.2kb-GFP 


E2.5 


morula 


+++ 


++++ 


E3.5 


blastula: 

ICM 
trophect 


+++ 
+/- 


+++ 
++ 


E6 


epiblast 

extra embryonic 


+++ 
+/- 




E7.5 


epiblast 

extra embryonic 


+/- 
++ 


+++ 


E8.5 


yolk sac 
placenta 
PGC (migrating) 


++ 
++ 




E9.5 


PGC (migrating) 






E11.5 


PGC (genital 
ridge) 
skin epidermis 
hair follicles 
mammary buds 
. AER 


++ 
+++ 
+++ 
+++ 
+++ 
+++ 


- 


E13.5 


gonads 
epidermis 


+++ 
+++ 


- 


E15.5 


gonads: testis 

ovary 

skin epidermis 
mammary tissue 
thymus 

blood vessels (SMC) 


+++ 
+++ 
+++ 
+++ 
+ 

+/- 


++ 


EI7.5-18.5 


gonads: testis 

ovary 

sldn epidermis 
hair follicles 
mammary tissue 
prostate 
thymus 

blood vessels (SMC) 
osteoblasts 


+++ 
++ 
+++ 
+++ 
-H-+ 
+++ 
+ 
+ 
++ 


+++ 
4-H- 



not, and these include tissues with defined or postulated stem/progenitor cell activity, such as 
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muscle, pancreas, small intestine and colon (Marshak et al., 2001; Charge and Rudnicki, 
2003; for a contrasting view Dor et al., 2004). Also negative for GFP expression were the 
E11.5 dorsal aorta and E13.5 fetal liver - sites, respectively, where definitive hematopoiesis is 
proposed to occur, and where primitive hematopoietic stem cells: "home and develop 
(Dzierzak, Medvinsky, and de Bruijn, 1998). This indicates that the 11.5kb-GFP transgenic 
GFP promoter/enhancer is highly tissue-specific in its activity. 

EXAMPLE 4: 

Isolation and Characterization of the Human s-SHIP Promoter 

The human s-ship promoter has been isolated and compared to the mouse promoter. 
FIG. 6 provides a comparison between a region having the genomic sequence of the human 
promoter that includes 560 nucleotides upstream from exon 6 (at the 3' end of inrron 5) and 
the corresponding sequence from the mouse sequence. 

The mouse and human promoters were significantly homologous. Both promoters 
contain a binding motif for p53 proteins (p53, p63 or p73). The motif is identified in FIG. 6. 
Electrophoretic mobility shift assays show that p53 from nuclear extracts of ES cells will 
bind to a sequence with the p53 motif shown in FIG. 6. FIG. 7 shows p53 binding sequences 
in mouse, including the different half sites. 

In separate experiments, DNA damage caused by UV or gamma irradiation of ES 
cells induced both p53 and s-SHIP protein expression. ES cells that lack p53, from knockout 
experiments, did not express nor induce s-SHTP protein after UV irradiation. From the 
Tgll.5kb-GFP mice it is known that GFP expression occurs in cells that express p53 (ES 
cells and others), p63 (epithelial cells of skin, prostate, and mammary tissue), and p73 
(neuroepithelial cells as in the vomeronasal organs). 



All of the compositions and/or methods and/or apparatus disclosed and claimed herein 
can be made and executed without undue experimentation in light of the present disclosure. 
While the compositions and methods of this invention have been described in terms of 
preferred embodiments, it will be apparent to those of skill in the art that variations may be 
applied to the compositions and/or methods and/or apparatus and in the steps or in the 
sequence of steps of the method described herein without departing from the concept, spirit 
and scope of the invention. More specifically, it will be apparent that certain agents that are 
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both chemically and physiologically related may be substituted for the agents described 
herein while the same or similar results would be achieved. All such similar substitutes and 
modifications apparent to those skilled in the art are deemed to be within the spirit, scope and 
concept of the invention as defined by the appended claims. 
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CLAIMS 

1 . An isolated polynucleotide comprising an s-ship promoter capable of promoting 
transcription operably connected to a heterologous nucleic acid sequence. 

2. The isolated polynucleotide of claim 1, wherein the promoter comprises at least 20 
contiguous nucleotides from SEQ ID NO:l , SEQ ED NO:2, SEQ ID NO:3, SEQ ID NO:4, o 
SEQ ID NO:5. 

3. The isolated polynucleotide of claim 2, wherein the promoter comprises at least 50 
nucleotides from SEQ ID NO:l , SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID 
NO:5. 

4. The isolated polynucleotide of claim 3, wherein the promoter comprises at least 1 00 
nucleotides from SEQ ID NO:l , SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID 
NO:5. 

5. The isolated polynucleotide of claim 4, wherein the promoter comprises at least 500 
nucleotides from SEQ ID NO.T , SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, or SEQ ID 
NO:5. 

6. The isolated polynucleotide of claim 5, wherein the promoter comprises at least 1 000 
nucleotides from SEQ ID NO:l , SEQ ID NO:2, or SEQ ID NO:5. 

7. The isolated polynucleotide of claim 6, wherein the promoter comprises at least 5000 
nucleotides from SEQ ED NO:l , SEQ ED NO:2, or SEQ ED NO:5. 

8. The isolated polynucleotide of claim 7, wherein the promoter comprises about 6.3 
kilobases from SEQ ID NO:l or SEQ ED NO:5. 

9. The isolated polynucleotide of claim 8, wherein the promoter comprises about 7.6 
kilobases from SEQ ED NO:l or SEQ ED NO:5. 

1 0. The isolated polynucleotid eof claim 1 , comprising a sequence that hybridizes under 
stringent conditions to the complement of SEQ ED NO:2, SEQ ED NO:3, SEQ ED NO:4 or 
SEQ ED NO:5. 

11. The isolated polynucleotide of claim 1 0, comprising SEQ ED NO:2, SEQ ID NO:3, 
SEQ ED NO:4 or SEQ ED NO:5. 

12. The isolated polynucleotide of claim 1 , wherein the promoter is capable of promoting 
tissue-specific transcription. 

13. The isolated polynucleotide of claim 1, comprising a sequence that can hybridize 
under stringent conditions to nucleic acid segment comprising the complement of i) at least 
20 contiguous nucleic acids of SEQ ED NO:l, SEQ ED NO:2, SEQ ED NO:3, SEQ ED NO:4, 
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or SEQ ID NO:5; or ii) SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID 
NO: 10 and/or SEQ ID NO:l 1. 

14. The isolated polynucleotide of claim 13, comprising a sequence that can hybridize 
under stringent conditions to nucleic acid segment comprising the complement of i) at least 
50 contiguous nucleic acids of SEQ ID NO:l, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, 
or SEQ ID NO:5. 

15. A nucleic acid comprising a promoter operably attached to a nucleic acid sequence 
from an s-ship gene or a portion thereof and a marker sequence, wherein the s-ship gene is 
disrupted by the marker sequence. 

1 6. The nucleic acid of claim 15, wherein the promoter is an s-ship promoter. 

1 7 . The nucleic acid of claim 1 5, wherein the promoter is constitutive. 

18. The nucleic acid of claim 1 5, wherein the promoter is inducible or conditional. 

19. An expression cassette comprising an s-ship promoter operably connected to a 
heterologous nucleic acid segment. 

20. The expression cassette of claim 19, wherein the heterologous nucleic acid segment 
encode a protein. 

21. The expression cassette of claim 20, wherein the nucleic acid segment is a reporter 
gene. 

22. The expression cassette of claim 21, wherein the reporter gene encodes a gene product 
that is colorimetric, enzymatic, luminescent, or fluorescent. 

23. The expression cassette of claim 19, wherein the nucleic acid segment encodes a 
therapeutic or diagnostic gene product. 

24. The expression cassette of claim 23, wherein the therapeutic or diagnostic gene 
product is a polypeptide. 

25. The expression cassette of claim 23, wherein the therapeutic or diagnostic gene 
product is an RNA molecule. 

26. The expression cassette of claim 25, wherein the RNA molecule is an siRNA or 
miRNA molecule. 

27. The expression cassette of claim 23, wherein the nucleic acid segment encodes a 
therapeutic gene product. 

28. The expression cassette of claim 27, wherein the therapeutic gene product is selected 
from the group consisting of a tumor suppressor, oncogene, a cytokine, a cytokine receptor, a 
differentiation-inducer, growth factor, and a growth factor receptor. 

29. A vector comprising an s-ship promoter. 
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30. The vector of claim 1, wherein the s-ship promoter is operably attached to a nucleic 
acid segment. 

3 1 . The vector of claim 30, wherein the nucleic acid segment is all or part of an s-ship 1 
coding sequence. 

5 32. The vector of claim 30^ wherein the nucleic acid segment is heterologous. 

33. The vector of claim 29, wherein the vector is a plasmid, YAC, BAC, or virus. 

34. The vector of claim 29, comprised in a pharmaceutically acceptable formulation. , 

35. A host cell comprising an s-ship promoter operably attached to a heterologous nucleic 
acid segment. 

10 36. The host cell of claim 35, wherein the host cell is eukaryotic. 

37. The host cell of claim 36, wherein the host cell is an embryonic cell. 

38. The host cell of claim 37, wherein the embryonic cell is a blastocyst cell. 

39. The host cell of claim 36, wherein the host cell is a hematopoietic cell. 

40. The host cell of claim 36, wherein the host cell is a stem or progenitor cell. 

15 41 . The host cell of claim 40, wherein the stem or progenitor cell is from tissue selected 
from a group consisting of skin, a hair follicle, cornea, embryo, gonads, mammary gland, 
pancreas, and vascular smooth muscle. 

42. A recombinant host cell in which one or both s-ship genes is disrupted by marker 
sequence. 

20 43 . A transgenic animal comprising an s-ship promoter region operably attached to a 
heterologous nucleic acid segment. 

44. The transgenic animal of claim 43, which is a mammal. 

45. A mammal having cells comprising an s-ship transgenic sequence. 

46. The mammal of claim 45, wherein the s-ship transgenic sequence comprises a s-shipl 
25 coding sequence flanked by loxP sequences. 

47. The mammal of claim 46, further comprising a heterologous nucleic acid sequence 
encoding a Cre recombinase. 

48. The mammal of claim 47, wherein the nucleic acid sequence encoding the Cre 
recombinase is under the control of an inducible or conditional promoter. 

30 49 . A method for expressing a recombinant nucleic acid in a stem or progenitor cell 
comprising: 

a) transfecting the cell with an expression cassette comprising an s-ship promoter 
operably attached to the recombinant nucleic acid, wherein the nucleic acid is 
transcribed. 
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50. A method of screening for a candidate substance that regulates activity of the s-shipl 
promoter comprising a step selected from the group consisting of: 

(a) contacting a nucleic acid comprising an s-ship promoter with an s-ship 
promoter binding protein and the candidate substance under conditions that 
allow binding between the protein and the promoter and determining whether 
the candidate compound modulates the binding between the protein and the 
promoter; and 

(b) contacting the candidate substance with a cell comprising the s-ship promoter 
operably attached to a reporter gene coding for an expression product and 
assaying for expression of the reporter gene expression product. 

51. A method for identifying stem cells in a population of cells comprising: 

(a) administering to cells in the population a nucleic acid comprising an s-ship 
promoter operably attached to a reporter gene. 

52. The method of claim 51, wherein the cells are in an organ. 

53 . The method of claim 5 1 , wherein the cell are in an animal. 

54. The method of claim 51, further comprising sorting cells based on expression of the 
reporter gene. 

55. A method for screening for a modulator of cell function comprising: 

a) transfecting a stem or hematopoietic cell with an expression cassette 
comprising an s-ship promoter operably attached to a nucleic acid encoding a 
candidate modulator; and, 

b) assaying the cell for a cell function, wherein a difference in cell function in the 
cell as compared to a cell in the absence of the candidate modulator is 
indicative of a modulator. 

56. The method of claim 55, wherein the modulator is a candidate therapeutic agent for 
the treatment of a blood-related disease or condition. 

57. A method of treating a patient with a blood-related disease or condition comprising: 

a) transfecting a cell with an expression cassette comprising an s-ship promoter 
region operably attached to a therapeutic nucleic acid; and, 

b) administering the cell to the patient. 

58. The method of claim 57, wherein the cell is a bone marrow cell. 

59. The method of claim 57, wherein the cell is autologous. 

60. The method of claim 57, wherein the cell is allogeneic. 
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61. The method of claim 57, wherein the blood-related disease or condition is a blood- 
related cancer. 

62. The method of claim 61, wherein the blood-related cancer is leukemia, lymphoma, or 
myeloma. 

63. The method of claim 57, wherein the blood-related condition is anemia. 

64. The method of claim 57, wherein the blood-related condition can be treated with stem 
cell replacement therapy. 

65. An isolated polynucleotide comprising a heterologous nucleic acid sequence under the 
control of a developmental decision promoter. 

66. The polynucleotide of claim 65, wherein the promoter is capable of providing 
expression in embryonic stem cells. 

67. The polynucleotide of claim 65, wherein the promoter is capable of providing 
expression in adult stem cells. 

68. The polynucleotide of claim 67, wherein the adult stem cells are differentiated but not 
terminally differentiated. 

69. The polynucleotide of claim 65, wherein the promoter is capable of providing 
expression in adult stem cells that are in growing phase. 

70. The polynucleotide of claim 66, wherein the promoter is capable of providing 
expression in a cell from mouse embryonic development stages E3-E18.5. 

71 . The polynucleotide of claim 70, wherein the promoter is further capable of providing 
expression in a cell that is in a developed animal. 

<■■' 72. The polynucleotide of claim 71, wherein the cell is a stem or progenitor cell in the 
developed animal. 

73 . The polynucleotide of claim 72, wherein the promoter does not constitutively provide 
expression in the stem or progenitor cell in the developed animal. 

74. The polynucleotide of claim 65, wherein the developmental decision promoter 
comprises an s-ship promoter region. 

75. The polynucleotide of claim 74, wherein the s-ship promoter region comprises a 
sequence that can hybridize under stringent conditions to nucleic acid segment comprising 
the complement of i) at least 20 contiguous nucleic acids of SEQ ID NO:l, SEQ ID NO:2 
SEQ ID NO:3, SEQ ID NO:4, or SEQ ID NO:5; or ii) SEQ ID NO:6, SEQ ID NO:7 SEQ ID 
NO:8, SEQ ID NO:9, SEQ ID NO:10 and/or SEQ ID NO:l 1. 
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76. A method for expressing a nucleic acid in a stem cell comprising providing to a cell ; 
polynucleotide including the nucleic acid under the control of a developmental decision 
promoter, wherein the nucleic acid is expressed in the cell. 
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Gala/Lmo2 

-100 ACCACT GGGGT GTjGGCCJTAlTCTGCTGTTAGGACC TGAAT TGCCT GGAGT G 

lniliatof - rtan.SHIP.PT hn 

-50 TTTCTAGTTCCCACTAGTTGTTGAACTTXACCTTGAACCTCTGCTCCCAG 
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CLUSTAL W (1.82) multiple sequence alignment 
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■ (#1) 5'halfs.te g'_ half jg; @ 5' half site , fro . 




(#3) 5' half site, , ^^^i^. R R ^cvvwG*yY ' 

/fi/mV ." «*^vjr.^*..f.pi^ l i; l ! ; ; ;,-|. 
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SEQUENCE LISTING 

<110> ROHRSCHNEIDER, LARRY R. 



<12 0> METHODS AND COMPOSITIONS INVOLVING S-SHIP PROMOTER 
REGIONS 



<130> FHCC:01SWO 



<14 0> UNKNOWN 
<141> 2005-03-18 

<150> 60/554,318 
<151> 2004-03-18 



<160> 25 



<170> Patentln Ver. 2.1 

<210> 1 

<211> 100140 

<212> DNA 

<213> Mus musculus 

<220> 

<221> tnodif ±ed_base 
<222> (27350) . . (78168) 
<223> N = A, C, G OR T/U 



<400> 1 

ggcaatttct gagaggcaac aggcggcagg tctcagccta gagagggccc tgaactactt so 

tgctggagtg tccgtcctgg gagtggctgc tgacccagtc caggagaccc atgcctgcca 12 0 

tggtccctgg gtggaaccat ggcaacatca cccgctccaa ggcagaggag ctactttcca 180 

gagccggcaa ggacgggagc ttccttgtgc gtgccagcga gtccatcccc cgggcctacg 240 

cactctgcgt gctgtgagta cccgtctcct cccaactgtc agatccaggg accactgagg 300 

tgtggatcca aagggggaac ccctgtaatg ggagtttgag ttaggtttat gtcataggat 3 60 

ggtgggacgt gactggcact tcgttgccct gtggggaggg gagaaggggg ggcagcatct 42 0 

gaggcccact ttggaccttg gcgttcgagt tcagggagcc tgtgtcatga caggcttgtg 480 

tggtgctagg gctctctgag tgactctggg cctcccctat actgcagccc tccatgacct 540 

gtggtgccag gggtctccgt gagttcctgt ggcaggagca gggacagagt caggaagaaa 600 

ctcaggcctc tctggtggag ggtgtattgg aatgcatttt ggtcagctca agcgtcagtc 660 

agggactcag tgaagggcaa ccttggcaaa agggtcccct cctccaccct gctagtatgt 72 0 

ggtcttagag ctaattctat ttggggagct gagtccgggg tgcatttaat caggatagga 780 

ttcctcgtgt acacatttta ctctaggctg caaggacaca ggaagccaca gagctgctct 84 0 

ctgagtaggt ctctgtccct ccttgcactc agctatgtcc ctccctatcc aggtctctgc 900 

ccccgttggt acccccccac agcccagtgt gaagatgttg ctgaatatgc tggttatcct 960 

aacaacagag gaaacagcca cttgctgaag gttcettttt aacactccgt ccgtctagcg 1020 

tcttctgagg aaggccgcca cccctatagt cctgtggtca gaccctgtcc aggcttcagg 1080 

ctggagcagg gcaggaacac tgtcaggaag ggtgtgccta cttgaacagc acaggtacca 114 0 

tctgatagat tgtccctgga ctgagagaaa gatctttcag agcagctagc tgcccccccc 1200 

ccaatcttca tgcaggaggg aagtgggtga ctgaccacag actgttctga gctctgactc 1260 

atgttgggac ctgctgtgct aggcatttgt tggctagttt catcttcaag agagccagga 1320 

gtgtgggttc tatgaacacc tagacctgcg attaaggacc atgagggctg gacaggttac 13 80 

aaacatcgag gataaggtcg gtcatttgct cccagagacc actagcttgt gctgcctacc 144 0 

tcctaccacc atcaggctag gcatgccagt ggacttgaat ggaagtaaga ggagctagag 15 00 

tgttcacaga gttgggggca ggggcactag aagcccgtac aggctcacgg ctggctgagc 1560 

atcttgtgct ggtggagtta gcagccagct tcctgcacac ccaccagcat atttcaggag 162 0 

aagcactgca ggtggcctgg acctccccaa agcactgtct cttggggaat ctaccagtga 1680 

gaggccctga aaggaaagag ggcaggaagg tacttttcag tgtgtcacaa gctcagcctg 1740 
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actctactaa cccagttata tttttttctg tcccagtggg atttgggcca agggcaattc 1800 

tgttgggagt tggctatggg ctgagagact gttgtatgca ttggttacaa atgcacacca i860 

cggtatgggt tctcttgtgc taaactggca gcatctggaa ggctggagtc agagcagaag 192 0 

gagcctgagc caagaccaag ggatcttgga ggaccttcgg gcaacacaga ccttgccttt 198 0 

tttcttcatc tgactcccct gcctcagctg tcttaagtcc aagcaaagca aagatgacac 2 04 0 

ggagattttc aaactaaagg aatgactacc acaacctcag tgttctataa tgaccccccc 2100 

cacacacaca cacaccctaa aaatgtagga aaggcaggac agtggtggtg cacaccttta 2160 

atcccagcac tcgggaggca gaggcaggca gatttctgag ttcgaggcca gcctgatcta 222 0 

caaagtgagc tccaggacag ccagggctac acagagaaac cctgtctcga aaaaaaacaa 22 80 

aaaaacaaaa aatgtaggac aggcatactt tttaatttga aaaattgtta gaagcctgcc 2340 

ttttcatgca aagagactta acttcctgaa aaaaacaaac tttagatcct tactttctcc 2400 

tgttctctgt ggatgatgga acctccccca cctcatcgct gccccctcgc catctgccct 2460 

ggccagagtc caggctcctg cccaagaaga taagtcagca gcttgtagga cagcaacaag 2520 

gtcgaggtca gagatggggc tgtgaaggag agatggggca tggggtacat gtgggataca 25 8 0 

gggcagctga gttctctttg gtttcaggag tgatagattt ggcacgtgtg gtgtcttgct 2 640 

ggacatcagc cagtctgtgt gggctctggg gcaggcagct gtgggtcagg tgccgctggg 2700 

actccatcca tgttccttct tgtgaccgaa gggacaccaa caggggccca gtgcatgtgt 2760 

tgggtttgtt tggccctctg ' ggaagaatgc agcattgtga ggagaacctt cctgctctga 2820 

gatttgacta gacatgacat gggagaggga ggtaaatgct taaagacaag gttgcaattc 2 8 80 

agttccaccc acgtgacacg agtgcacatt caccccacac attgaccttt gtttccttca 2940 

gagagatcaa tcctgtcact tatcaccaag caagaaggct tttctttctt ctcagtcatc 3000 

attcagagag ccttggttta ggggagtctg gagttacact ggggccccgg agcactggcc 3060 

tggggagacc ttgctagtat gactggagtt ctgttctacc ttccttgaaa gggaatgtgt 312 0 

gccttttgag tggggcctgt atcaccttca cttgagtaga gcctgtgtca ccttcactca 318 0 

tttgtactca gttcctccgt gatgtcagct cccttcccag gagcctgtgc accctgttgt 3240 

ggggtattag ccaggtggat ggagacctat taagagtctc atgagcaggg acagcgcagc 33 0 0 

tacaccatgt gttgcagaga agacaatgct tctgagggta gctcagcaga tgccgggttt 33 60 

gtgggtcatc ccagtagctt cttattgctg aagctacata gcaagaattt gaatgatgac 3420 

ccagcacttg gaaacaactt gctctttcta aaagagatga caaggccaga ttcaacttgg 3480 

tcaagatgac tgttgtctat gtgaatggca tttcccctaa actactctgg agtgcttcct 3540 

cccttgcagg gagaatatgg ttgcctctgg tccagccccg atgaaggatt ccctaagcaa 3600 

gtggtcttcc atagtgcacc caggcctggt ggtggggtaa gctctgtccc agggatagaa 3 660 

tgccaatagc cttggtagct ctggcagtgc agaaagaaag gagaaaaggc atgggacatt 3720 

tacaaccaaa actgccctca gagaaggcat tctagtcttt tgaaagaaac ggtgtgacca 3780 

gacactgggt gtgataagcc tgccagggga gataaaaaca ggccctgctt ttaggactac 3 84 0 

tggagagcag ggtgaagatc acacacttat actgtctcac acttgttctt tggtaggaag 3900 

agaactgcag agaaggagtt ggtgagggtg aaaatgccta gcagagggag tcagggcatg 3 960 

aggtcatctc ccctccccat cctccattga aaatgtctat aaggtttcca cagcatgata 4020 

atggcttttg tgcaaacaca gtgtggcact gttttccata tctggcatca aaggcaaatg 4080 

agggaaatac ctgcataggc agaacccaga gctgaaggcc tatgggctcc tgagaaacat 414 0 

gagaaaaggc ctttgtttga gaaggatgct cttgaagact cattgtgtcc aagaccagga 4200 

ggaagggctg aacccaggag ggcctattta aggcctattg tatcatttat gtaagtggca 42 60 

gagtgactat gtttctgccc atacccagta ccctggagct gtcttcccag gtacagggta 4320 

ggcactggct taggcgcata ggttaaattc acctacaacg caaggccatt agcacttctt 43 8 0 

aacgatccca tctctctgcc tgccacagag atggaacaag aactgctatc gttacccaat 444 0 

tcatgccagg ccacgtagtc ccaaggagca agactctgga gctgcctgga atctcttgtg 45 00 

tcatgtcage aagacaccta gaaccccagc tccaaagaga ggctggccag accagttcac 4560 

tggctttaca gtgcctcagc tgaggttaag gtacccactg ttaagtcacc catccacatt 4620 

ctagtttgtg gttcaatggt ccttagtaat ggtcacagag ttacgcaact gacaccacca 4680 

tctaatttca gagagttctc attattccca gaagaactgc cccccccccc cacacacaca 4740 

catgtgagca gctgcttgct tttcttctga cctctggaaa gagccagtct actttctgcc 4800 

tttaaggatt tgcctgactc ttgcattttg tgtatgtgaa attatatggc gtgcagcttt 4860 

tgacatctgg atctgtccac ttagcacaat gccctatgct cattgaattg tagcaggcat 4920 

ccaatggttt gataacccac tgtgtggaca caccacattc cccggttgag tggcgttttg 4980 

cactgttctt actttttgac tgttttgaac aacagttgct gtgaacattc atcaaccagg 5040 

ctctgtgtgg atgtctgttt gcaggagtct tggggagtga gaggcagggc tgggtcatgt 5100 

ggtgattttt atgtttagcg ttttgaggaa gtactgaact cttgtgctca gctagagctc 5160 

taaccagaca ttgtgagggt ccagtttctc ttctagttac tctctctctc tctctctctc 522 0 

tctctctctc tctctctctc tctgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg 5280 

tgtgtgtgtg catgtgtaca tt.tgcacact gtgtgtgtgt cagccaaaag acaacttcag 534 0 
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gagttagttc gctccttcta ccatgtgggt cccagggact gaactcagat agtcaggctt 5400 

aatgacaagt acccttaccc tctgagctac ctcactggcc ctaccttttt catttcttta 5460 

tttaagacag aggctctcaa tgccctggaa cttacgtagg ttaggtgggc tgatgttaga 5520 

agcaccaagc ccagttagtg gtggtcggtt gcttgttaat gtagtctggg catcaaactc 5580 

aggtgcttat gctttcaagg gaagcacttt atcagctctt acctccccag cccctcactt 5S40 

gtttgtgtgt ttgtgagcta tggtctcaca tagcctaagc cagcctcaaa ctccctattg 5700 

tagtcaaaat tggctttgaa ctcttgatca tcctgcctcc actttctaaa tgtattcacc 5760 

acaatatctg gctgtctttt tattctaccc agcttagggg ctgcaaagca gcacagcatc 5820 

taactgtggt tttagtttgt atccctaatg ttaataatgc taatgggtca tttgtatgtc 5 88 0 

ttacttggag aactgtctgt atagtctttg cacattgaaa tagctatatc atttcaatga 5940 

taaaatagga agacaggagt aatgtggata ttccatagcc taacttgaaa cctctaggtg 6000 

ctttgcatat gtaaaaatag agctggccat tagtttttag agctgaaagc aaccagtgat 6060 

atctcagaag cagaagaccc atctgtggag aactttccat gacaggagag gagagctgtg 6120 

acagtgtcac ttccgggact tcctggaggc ctctgggaga cacagagctg tcatgtgggg 6180 

cctccacaga ggaagtgctc aaagtgactg aggcaggaag aggacttcaa tgacgcaaga 6240 

gtgtctggtg gctgctgagc cacaggggtt gacagcctgg aagcctgggg agggaggagc 6300 

ctgggtacag acagcaagaa atgcttagag gactgggtat gaagatgaag tctaggagag 63 60 

aggctggccc ccaccacctt ctgacctgag ctccaactta gtaaagagat gccgaaggga 6420 

tgaagtcctc tgatcgctaa aatgctagct gttctatggg aggaaacacc atgttggtgg 64 80 

gcacttgccc tttgagagag agggtgacgg aggtggtgag ttcaatctcc acagcccaca 6540 

tggtgcatgt ttgtaagccc agttctagga aggtggagac cggagggtcc ctagggttca 6600 

tgggccagcc agactagctg aattagtgag actcagacca tgtctcaaaa caaacaaaca 6660 

ccaaaacaac aacaacaaac cctagagggt atctgaagga ggacatcatg attgtcctct 672 0 

gatctctact cacacattca tgtatgcaca catgtgccct tacacacata catgggggca 6780 

tactgttaac atgtgccagc acccatcaat ttgggtttac tttgcaattg aagtaacact 6840 

gtgaagggct ggccaaagga acctctgaaa aaagagagtg cccacgtggc cctgcagctg 6900 

gaataggcag tgtagaggtg gacagacctg gtatgaaaca gcaaagtctt ctttcaggtt 6960 

aacagtagta ttggctgtgc ggtggtacac . acaggtgact gcacactccg aaagctgagc 7020 

aggaagattg ctgaaagttt cagaccagct tcccaggcta catagtgagg ccctatctca 7080 

acttaaccct cacagtgagt taagcataaa gtaaaaagtg taaaataaac aaaaattgtc 7140 

acccaatcac taagagacta gtgcaagcca tgtgaatttg ttgatgtatc ttctcatacc 7200 

catttctatc acataatata catacacaca catacacaca catacacact cacatatatg 7260. 

tacacaccac ctttcatttg tggtctgggg tttccaacaa catcaatttt tcctgcctag 7320 

ttagaatttc ttccttaaaa tcactcttgc tggaagccta ggcttctctt ccagtatgtt 738 0 

tatgctttat ggaatgattc tttgattgac atttacatta ccagcaatct tgctgtaagt 744 0 

gtggcagcag caaaaaaggt cttccagcag aattgctaca tatgttactg gttacaaatc 7500 

cccactgcag ttagagcacc agcaggaatc gtcttcaaag actggacaca aaatggcaac 7560 

ttgctttcca gaaaaattga cttaatgtat gcttctggca agcagcatga gaaaatgccc 7620 

acttccttcc aatatcctat tctacggaat gttactttta agaaaattgg agtcataaca 7680 

tgaaaggtca tctgtgatca ttttgttttg cttttgattg ctgatcctgt ttgagaggtt 7740 

tgctggccat tttcttttgt ggatggtcct ctattctctt agcttctttt cttcaattaa 7800 

gactgtgtgt agcttgggaa gaagggttgt gataagtccc aggctaccac ggactgtata 7860 

atgaggtctt atcttaatag agggtggagt ggagctggag atatggccca gtggttaagt 792 0 

gtttgcttag tgtgtgtgag gtcctaggtt caatccccag taccaaggaa tgctgcaagc 7980 

tttgccatct gcctatgtga gttctgtcat gtttgtgaca aatatatctt cacgatggtc 8040 

tgcctttaag gtttagattc tctgacatgg gagggtcact gttttctact gggacttctt 8100 

cctgggattt ctaacacaga gactccttct gcacacagaa atccactcag aaaggttaag 8160 

actgggtctt tgtagctcat ccacttgggt taagggagga acctccagca gtttatcgaa 8220 

ttacactggg ccccagcctt gtcatctgaa aaccaagtat aagaggatct atgtggtaga 82 80 

attgttgcat gtgttgatgt gtgtatctgt atcatgtatc atgatactaa caaatgtagc 8340 

caccgtggct attattttta tatgttttat tgttcactca atttagcttc aatagctaaa 840 0 

tcttcttgga cttaatagat gacaagtgat gaagatttat gtgtattttt ctcaagtaac 8460 

catccttccc caacctttta ttaaatgact tatgcttttt ccctccatga ttcctttgtc 852 0 

taccatatgc taagctcttt ctgggttgtt tatctgtcca agaataccca actacaccac 8580 

tacctttaca aagacataag cataattttt caaaatgttt tggggttatt atctcttata 8640 

ttcaagtgtt ttcccaggta aaccagatag taatattttt caacttggac aaaacttact 8700 

gttatgactt tcactgggat ggcaggaaat cctaatgtaa gctgtgagct tgacagtctt 8760 

attgtgctcc ctctctgatt gtgtaatata tttcacacag accctgagtg tttcttgtct 8820 

gggtctctgc tgatgacttg tatgtttttg ttattgttgt gaatggcaca tttctcccgc 8880 

agttactggt tcatgataag cggttgcctg catagaattt tatgctttca gctttcattt 8940 
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tgtcccactc 
taaagagatc 
acacatgttt 
tgatccatca 
gctgattttg 
caccccacct 
ttcagacagg 
ttcccaagat 
acgtgggctc 
actgaacgtt 
gatcatagtg 
attggccagt 
taaggagagc 
agaatggaat 
ctgttttgaa 
ttggggccct 
ggggactgag 
gaaagagaga 
caagatgaat 
gtggtaaact 
tctttgctta 
ctctccctgt 
tctcttttgg 
ccctgtgact 
ggcttaggta 
ggtgcatagg 
cttcaacctc 
tttctgaatg 
gcatttatcc 
acacataact 
ggactaaagt 
acagtgctca 
taatcgtgtg 
gtgcacacct 
ccagcctggt 
caaaaaacaa 
ggttaaattt 
gggagcccat 
aggaaagcca 
ataaggccag 
gggttgagcc 
cagtgatatc 
ctgtgaaatg 
tcaagagaag 
tgccccacat 
tacatagaaa 
gaatatcaaa 
tgtttaaaaa 
aaaacagaaa 
aaggcaagct 
agcaggctgt 
cagactcact 
atggctcata 
agaagccgac 
ctatgccttt 
ccctgacacg 
gatgctttta 
atgaagaaac 
cggatgtagc 
tgttcaccac 



tttggccact 
ttgttatctg 
gtgtttgcct 
ctctctgcct 
gctatgctga 
cagcactgga 
acctcactat 
ccagtattca 
cggggatcta 
ctacccagcc 
tttgcttcct 
atttctaaag 
ttcacatacc 
tgggacatga 
acaatgggtt 
cttttgagtg 
ggaaacagct 
cagacaggca 
ggggatgaaa 
cagacagaca 
aagcattagg 
ctctccccca 
aggggcccaa 
ccagtgttct 
cattttctct 
gacagggtca 
ccaaatgcta 
caatctttcc 
aagaaaagag 
agagttctgg 
ttggactaat 
ggcagggtca 
tggttgtttc 
ttaatcccag 
ctacagagta 
aaaaataaaa 
tcctttttgg 
gtgtttcacc 
ttctcacgct 
gtcagatcag 
taggagccat 
tctgagaggc 

ggtggggttg 

ctttcccatg 
ttactttaag 
aatataaatg 
atccacagta 
aaaaacaaaa 
cagccaagaa 
gccatagttc 
ggggaagtag 
ggagcgcagc 
cctgacatcc 
ctaggctaca 
ttgtctggct 

tgtggtgtcc 
gtcttcccag 
aaaaacaaag 
catgtataca 
aacattctta 



cctcattcat 
tggtgtgtgt 
gcacatgggc 
tgttcacttg 
ctgaccaggt 
tttatagatg 
gtatccttgg 
aggcacacac 
aactcaggtc 
caactttfcct 
aatgcttaca 
tggtagaghg 
ctgccaccgg 
gggttctggg 
tatttaagat 
gtgataaggt 
tcttttagaa 
gacaggcaga 
ggacccacca 
gacagacaga 
tcaagagtga 
cctctcacct 
ggaagcaggg 
ctaggttgga 
ggttctgtgc 
ctatgtaacc 
gggttgtagg 
attttgactg 
aaagcacact 

tgggatttac 

gttttcataa 
tctgagatac- 
tagaatatct 
catttgggag 
agttccagga 
ataaaaataa 
gtcagccata 
ggctccttag 
ccctcctttg 
ggcagagatg 
aggggcgtat 
gagatgctat 
gggtggaaca 
gggaatgaag 
tatatatgaa 
tatacagcaa 
gcatagatca 
aacaaaaagc 
caaggacaaa 
tccccctctc 
acactctcat 
tgtcagtctt 
tagcacttgg 
taaaagtgtg 
gactctgtct 
agtgtgagcc 
caacgaatag 
acttcagcca 
cacacagatg 
cacaggcaaa 



ctttcagctg 
gataccatcc 
caggaggtga 
agacaatgct 
ccagcaagcc 
tgtgtgacca 
ctggtttaga 
caccacacag 
tttatgctca 
gaataggcct 
accactttct 
gcaactgttg 
gttgcctttt 
aaggcagata 
agagatagct 
ctggtgattc 
tcaaaagcca 
aagaaagaag 
aaccaacaag 
gggctccctt 
ccgcttgggt 
gttggccacc 
gatagcctct 
gtagcaagaa 
tagccaatat 
caggttggct 
tgcacaggac 
ttttctaccc 
gcaaagaatg 
aagagcagca 
gggtgatggt 
ttctgttggg 
tttaaaaggc 
gcagaggcag 
cagccagggc 
aaggccaggt 
tctttatctc 
ggctattcca 
cttaggagga 
gctctacaat 
aagtgggcgt 
cgggccagat 
ttaagttgtg 
caacgaagtc 
ttcaaaacag 
cggaagggaa 
gagtcttgtc 
aaagagatgg 
ccaaaccttt 
agatctgaaa 
atctgttagg 
tatcaaaatg 
gagactgagg 
cgtgtccacg 
agttcattgt 
aggcgctgtt 
atatggcatt 
tgtatgcatg 
ttgccatgta 
acattcaaaa 



cttctcctgt 
atgagcacag 
gatgaggaca 
tctctctaag 
tcctgtctct 
cagctcagtt 
aatcacaaag 
actctctgtc 
cacagtgagc 
tcatatcatc 
caaatgccat 
gaaaaaggaa 
ggtcagggcg 
cctgtcaggg 
cctttatatc 
tttttctttt 
ttttgcccaa 
aaaaagaaaa 
acacatgtag 
gtgagggcct 
ccctcctgct 
tcaaccctct 
gttaatatat 
taaacactct 
gccttgcctc 
ttgaactctc 
catacccact 
acctggccca 
ttttctagaa 
tgtggagcga 
tatgatgctg 
ttcagttctg 
caggcaagcc 
gtgaatttct 
tacacagaga 
gggctttttt 
tctgtccttt 
taagtagctg 
agggtcagag 
tcccctgcct 
ggtgaggatt 
ggtggtgata 
actagagcca 
atgaactctc 
taaaagatca 
ccaacaaaca 
tgataataag 
aaagggattc 
ctcgtgggga 
agagatctga 
aggaaggaaa 
taaaatgcca 
caggaggatt 
caagcatgtt 
ttatccacat 
ccaaaggctt 
gctgtcatct 
ggttaaccat 
gccacatata 
taactaccac 



cttttctttt 
atgtgtgcgc 
cctgaaatca 
catggggctt 
gctacccacc 
ttcttaaatt 
atgggcctac 
ctgctttttc 
actcttactc 
tgcaaaccag 
agcctattat 
acgtctgcct 
ttttattaca 
cctccccatg 
ctttgttggg 
tctggcaact 
catatcttat 
aggaaaacac 
tattgggcca 
aactcccgtg 
gaggagggtt 
tcctgttttc 
caagtctcag 
tgcttgctgg 
cctctttatt 
tatactcctg 
tccatatggc 
gaagtgtaca , 
attagaagtg 
gacacagcct 
tcgaatattg 
attttggcct 
gggcgtggtg 
gagttcaagg 
aaccctgtct 
tcctacactt 
ctttctgtct 
ccctcggcta 
ctcctgagag 
ggctctgtac 
ctctgtgttc 
tcgacctggc 
gaagagcctc 
gaaaaagcat 
taaagctggc 
ttgttgaaat 
cccaggatga 
aaagaaaaaa 
gaggcaaatt 
taagccagtc 
aaaaacccga 
gcaagacatg 
atcttgtgtc 
tgtaaagtgc 
aactagcaac 
tcgatacatg 
ccattttaca 
gtatgacaca 
catggatgtt 
caccaggctc 
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tggtcaaggg 
ctgtcatgtg 
tgtataaaag 
gggcagggaa 
ctcaatgtct 
gttttaccaa 
aaaccaccac 
acataaagag 
acatatatac 
aggataagtc 
ctgtgtgtgg 
cacatagccc 
ttctcttgcc 
atgatgggaa 
tgcacccagc 
cagcactcag 
gtgagttcca 
aacaaaacaa 
gtaaatgtga 
tgaggttgat 
tgcagtgtct 
ggtgcaggcc 
gtgctcttaa 
cattgttgat 
actgttcatg 
ggaagacctg 
gaactcctga 
ggtgactggt 
tcagagattc 
taaaccaaaa 
gagctctcca 
aaaacaagga 
ctggggagcc 
catgatacgg 
cattctcttt 
tttagatcag 
tggtgtgctg 
ctctgcctca 
cagactcccc 
atggtttgta 
gattctgccc 
tttccctggg 
tgagaaccta 
ataacagtga 
ttggtggcca 
ccttgattta 
agcagaggga 
tcccatagct 
caaatataca 
tgagttatgc 
cfcctttctgc 
gcctttaccc 
acaaacaaac 
accacccttg 
gctgagtgat 
gggcgtgcag 
attaagctca 
gtgcagtaag 
cttcaggatt 
tccggaatgt 



catgacaata 
tgtcttggtt 
gatatcttgg 
cacggaagca 
acccccagtc 
ctgggaacca 
aatgtggaag 
taagatttta 
acatacactc 
taagaaagtc 
tgcctttttt 
aggctggcct 
tctacctcct 
ctgagcttag 
cctcgatata 
gaggcagagg 
ggacagccag 
aacaaaactt 
aattgtagtg 
ttttatatca 
gtggagacca 
tccatgtggg 
ctactgacat 
agagttttac 
agaaccccaa 
gtgatggtga 
ggccaccaac 
cacctacggc 
taggaaagtt 
atgtagaagg 
acccagtgtc 
gtccaggagg 
tcaccctggc 
atcttgctga 
gatgatgtgg 
accacatagc 
aggggcagga 
ccatcatccc 
cccaagctgt 
atgccgtctt 
aatgaggacg 
agaaaagaaa 
gtagagctta 
ctcctcatag 
tggctgagaa 
ttcttttgtt 
ggtgatcttg 
cccttctgcc 
aagccataaa 
cacagtatgt 
catgtgggtt 
actgaggcat 
aaacagacaa 
cagtgtttgt 
gtggaagcca 
aaaggaggtg 
cagcgaccat 
gaaagacaag 
ttgctgaaaa 
gacaggaatg 



catgtaatgt 
acttccctat 
aattcatgat 
ggcaggtagc 
acacacctcc 
agtgtttgaa 
actttgatga 
tatatttaaa 
acaatctttt 
atggatgata 
caatatataa 
ccaaccctct 
gagtactggg 
ggcttcgtga 
aaacttaaac 
caggtggatt 
ggttatagag 
aaactgagtt 
ttgtgtggca 
tgggtgtatt 
gaagaggatg 
tgctaggaac 
ttctctccag 
agcccccact 
atagaggaag 
tgcttcccaa 
agacccccat 
cccttcccag 
aaaaaaaaaa 
agctcttggc 
tgaaaggggt 
agccaagtag 
cgatgtgctc 
gctcctggga 
tagctgaggg 
tatctagcag 
acgctgtgct 
tcagaccacc 
gtcagacagc 
tgatttattt 
ataaattcac 
aaagagaaag 
gctttgtccc 
atgacagtta 
aaggacacag 
tgacttccat 
caagagagat 
caaagcattg 
taaacttgtt 
gtgtgtagag 

ctggggattg 

ctcaccagcc 
acaaacaaaa 
gtgacaaaca 
gagaagctag 
acagccccca 
agtaagagtc 
catctaggga 
gaaaggggtg 
atgaattgtt 



ggtagaatgc 
tgctatgaag 
catccttagg 
tgagaactta 
ccaacaaggc 
cataggagcc 
tattttatga 
aaatttaaag 
atgcatgtgt 
ggagatataa 
tctttttttg 
atgtaactga 
attattggtg 
atgttaggca 
tgagccaggc 
tctgagttcg 
aaaccctgtc 
taatagtaac 
tgtggaaggg 
tcctgcatat 
ctggattcct 
ggaacccagg 
cctccctaac 
gcctttgggc 
acgggtaaga 
aactgctcac 
ggccaccatg 
agaatcctta 
aaaaaggtgg 
tggttggcac 
aggactcatt 
ggggaatgag 
cttccgcagt 
gttttggctc 
gttgaggaag 
tgacgtaaat 
gtatgcagga 
gtcatccccg 
cctgttctat 
caggttccgg 
tgttcaggta 
gaaaaacagg 
ccaaagctat 
taagatgata 
tggtgaagga 
ggatgagcag 
gggagattag 
gtcatggcat 
gattgattga 
gtcaaaggac 
aactcaaatc 
acagctggat 
aatcatcagt 
cagggacttg 
tttttgcctg 
tgtcctcagt 
acaacagcag 
ctgggtcacc 
ggaagggaga 
tcgatcaaag 



tgagaacatg 
agacaccatg 
cttagaatcc 
catcctgatc 
caacctccta 
tatgagaaac 
agaaagcaaa 
atatatacac 
tttaaaatat 
tttttttcat 
tttgttttgt 
ggctaacttt 
tatgtcgcca 
aacactacca 
gtggcgcacg 
aggccagcct 
tcaaaaaaaa 
tgtgaaattg 
attagtgaag 
atctatgtgc 
■tggaactgga 
tcctcctaga 
tctcatttct 
aataggagta 
ggccattcta 
agctcagctg 
ctaccagata 
attttaataa 
gggagggggg 
ttcctctata 
gcaggaagaa 
gtctcagcag 
tgtagctgag 
ttcacacctg 
cccctctaga 
ggcatatcca 
ctgtgaggca 
acatttgcag 
ctggctctgt 
aattgtgttt 
agtcccaaac 
tattctgaat 
gtgtctgggg 
gtcatgcatg 
aagccaagcc 
atgtcagtgg 
atgagatggg 
ctgaatcact 
ttgattgatt 
aacttgaagg 
ctaaggcttg 
ttfcaaaaggg 
gccaatcaaa 
tctgaccaca 
acttcattgg 
agaggagagg 
atgggacatc 
agaacaacaa 
tgtgaggcafc 
gggtagactc 



agcaaaccat 
accaaggcaa 
atgattatca 
cttttacaac 
atctttccca 
attcgcattc 
ctttagaata 
acacatatat 
tttataaata 
tctatatctt 
gagacagtct 
gaacttttga 
ttcctggctt 
accaagcaaa 
cctttaatcc 
ggtctacaaa 
caaaaaacaa 
taagttaata 
tgacttttta 
actctgtgca 
attaaagttg 
agatcatcaa 
agattagaag 
cctagctttc 
gcaggattaa 
tggggccact 
ccagatggct 
gttcttcctt 
ctgggattct 
gaaacaacca 
ggaagccatt 
aaggtgggta 
agctggtcat 
tctcttgctc 
tgatggagca 
tggccactct 
tcaggttcag 
gctttgtacg 
gttttcctgc 
acacttacag 
caacttgtgc 
agttgcaatg 
aactgagtga 
ctaaggatta 
agtgcctatg 
cccaggaagg 
gcctttaaat 
gtaaaaaggc 
gattgattga 
agttggttct 
ggtggttact 
aaacaaacaa 
gccggggaag 
gagtgtggca 
ggaagcatgg 
gctgtgtgtc 
tgtgtgagca 
ggagcaaagt 
gttggtggct 
agagctgcct 
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tgttgaaagg 
gaacagaggc 
acttccaaca 
tgtggcaaag 
agcgtgaacc 
gcacagttca 
ggaggcaggg 
cattggtggg 
ttggcaggtg 
catggctcag 
ttttttttct 
tggcttcctg 
ctcccctttt 
aaagaggtga 
ctctgcctgt 
gaccccagaa 
ccctctggct 
aggacactag 
cctcagtgag 
tcagatccca 
tcaccagcta 
gacagatgag 
gtttttatag 
tatgaccttc 
tgaaagtccc 
acaacaacaa 
gagagcctat 
atagtagcac 
gaaagtcttc 
gcacacacac 
taaggaacaa 
gttggtattt 
ctgcattctt 
tgcttagtgt 
ttcatggtoc 
ctttcccaag 
acccccacca 
gtgggaaggg 
ctgcttccaa 
tactgaaaca 
tggtctctct 
cttggctgtg 
tgaagagacc 
tcacgaatga 
agtaaggaga 
gctcaacggt 
cacatggtgg 
gacagctaca 
aagaagggag 
tgggagggaa 
aaggaataat 
tgtggacgtg 
agctgcaaag 
tggaggatgt 
gagacaggtt 
acatcctctg 
cgggccccgg 
tctacaaagg 
agagtgagct 
ggagggtctg 



cagtccttag 
aaagcctcgg 
gtcacttcct 
cttccgtcta 
actgtgggct 
tactttttct 
cggatggaag 
aagcaacaga 
ttcgttgcct 
gggaggtggc 
aagcctagga 
ctaggagtca 
gtggtttctt 
ctctgcggct 
ctggagtctg 
atgaccacag 
gatttttttt 
agaagcccag 
tcagctcttc 
gagggaacct 
ctgagctctt 
gacagatcat 
aagagatatg 
ttggtcatct 
cacctataag 
aatccccaca 
ctagcatgtg 
atccctgtaa 
aggtacatag 
atgcatgcac 
atcaaagctc 
ggccactgct 
tcctgagctt 
cage age age 
cagctcttct 
gacaatctgt 
ctcagggact 
tggcagcagg 
gctaagtctt 
etteggcagt 
gagatgggee 
tgtcctgtgc 
aaacaggaag 
cagggatggt 
catccatgtc 
taagagtgcc 
ctcacaacca 
gtgeacttae 
acagccacat 
agagatgetg 
gagatgaggg 
atgggtacgg 
gtgatggagg 
ctatccctgg 
taaegcagat 
ggtactgtgt 
gttcccatgc 
aagtgactct 
gttcacccag 
aggcaggaca 



geaaaaggee 
agggcagctc 
ecagaeggtg 
ggggaagggt 
ggagtaaegg 
cgtgtgcgca 
gtgtggtcta 
acaaatctga 
gaatactttc 
ttaggcatgc 
ctgtgtgaac 
actctcttaa 
ctgttccttc 
tcttgttgag 
tacccacgtt 
accttcagtg 
tttttttttt 
gecttgggae 
tctgtgggtt 
gattggctca 
tgctgcttgg 
agcctcctgt 
gecttgetgg 
ataccagcca 
gatgtcttgt 
catttacaag 
cagacatggt 
tcttagcact 
tgagttagag 
atgtacagaa 
acagecagtg 
tattttaact 
agecacagtt 
aggagcaagg 
tatgggaggt 
atctcgtggt 
cagctctgac 
attctcttgt 
ctttctcagg 
atgggaccca 
cttccttgcc 
attcaggatt 
atagaatgtt 
ctccatggag 
cagttcttgg 
gaetgetett 
tctgtaacga 
atataataaa 
gttcttcagt 
agaacagtat 
aaagataatg 
tttctacaaa 
atgaagatgg 
ctctgctccg 
tctcagaccc 
gaggecagag 
etagaggegg 
cccctcaggg 
ttgttgcctt 
agaagtctta 



ccagaaacat 
atgatggaat 
ggactcgaag 
ctgctcatac 
ctgtactgca 
tgaacaccag 
ggggtcaggg 
gtacccgcaa 
tgagatgtgg 
agccgtggtc 
tttgcctggt 
gttaaaagtt 
aaagtgatga 
ggtttgtgcc 
ctcttcactc 
ggactaggag 
tgcaaaggtt 
cagaggctct 
gctcctcccc 
gccaccaaca 
ctgttaggaa 
ggctacctca 
acacaattgg 
agggacagga 
cctataaaaa 
tgctggagat 
tcagttccct 

tggggataga 

atcggcacac 
cacacaccag 
aaggcaaggc 
ccagttc.ctc 
tctgtataac 
ctgagactgc 
ccttgttgaa 
ccagaaggct 
ttcattactt 
ttaacgcggt 
gctcagaaca 
agaccatttc 
tatttggatg 
ttgcagccac 
acatctcccc 
gccattgttc 
taagaaggga 
ccaaaggccc 
aatctgatgc 
taaataaata 
ctgcaaggtg 
aaaggggggg 
aaggaaggaa 
ggaaaagagg 
aagtgtgctg 
ggcactgaag 
agagagtgtt 
gcaggcaggc 
tgttcgtgtg 
aagtgacatc 
ttggagtagc 
acagacaact 



ggaagegaga 
tcaaaagaaa 
ggggccagtt 
tgagcttcat 
tagcatggtg 
gatggaaggc 
agtccctttc 
gcaagcgccc 
acgccaccag 
tctacttttt 
ggcttggtgt 
agteggcate 
agactttaag 
cacagcttgt 
ttgcttccgt 
cttcagtgtg 
tagtgccttt 
ggggccctgg 
tgetgacage 
gctctctctg 
gcaggtacct 
gecagggcag 
tttcagtctt 
agecatggge 
acaaacaagc 
gtagataagc 
agtactgtgt 
ggtatagagt 
atgcatg.tgc 
acatgetaaa 
agaaacattc 
ttttcatgtt 
tcagtcctta 
acagcgacac 
acattcattc 
cactgatgga 
atagacctac 
agacactagc 
tgtttccagg 
aeggagggga 
gtatttgatc 
atggctttgt 
tgctgtggtc 
tetttactge 
gacagggggc 
tgagttcaaa 
cctcttctgg 
tttaaaaaaa 
aaaaaagaat 
gactgaaaga 
cagtcaagaa 

tgggtgcatg 

tgggctgtgg 
caaggatcca 
ctgatttaaa 
aggatgetta 
aatcctgggt 
catcttgggc 
tttgctccct 
gtgttgtgtc 



gagaagttat 
ccacatcagt 
ccatccaggg 
gggagaggee 
gtcaggctga 
ggggcggatg 
ccagcccagt 
ttttttaagc 
cttctggaga 
tttttttttt 
ccttggcaag 
tttgatcttt 
tatgactgga 
gaacccataa 
ggcttcctgt 
ctttgggagt 
ggtggcctga 
ctgggttcta 
actgctcttc 
cctgaaccac 
gcccaccact 
tgagcaggca 
gcacttgacc 
ctaaatggtt 
aaacaaaaca 
tcaattggta 
aacacctggc 
taagaggatg 
acacgctcat 
acaaacaata 
ttcaggaagc 
gaagcaaaac 
tacctaaacc 
accaagggcc 
caccttcaat 
gggaagcacg 
aaaatcagag 
gttcacaacc 
cataaatggg 
ggcagggtag 
atgaatagca 
tcagggactg 
acaattctgt 
ccttgttctt 
tggtgagatg 
tcccagcaac 
agtgtctgaa 
aaaaaaaaaa 
gggaagaaaa 
cataaaagaa 
agaggaaggc 
cagggegaga 
aaggtgtctg 
cagacactgt 
gtccatgegg 
cttagcccat 
ccttcctgaa 
cgctgacttc 
gggaagccag 
tgaggaggct 



16200 
16260 
16320 
16380 
16440 
16500 
16560 
16620 
16680 
16740 
16800 
16860 
16920 
16980 
17040 
17100 
17160 
17220 
17280 
17340 
17400 
17460 
17520 
17580 
17640 
17700 
17760 
17820 
17880 
17940 
18000 
18060 
18120 
18180 
18240 
18300 
18360 
. 18420 
18480 
18540 
18600 
■ 18660 
18720 
18780 
18840 
18900 
18960 
19020 
19080 
19140 
19200 
19260 
19320 
19380 
19440 
19500 
19560 
19620 
19680 
19740 



6 



WO 2005/090559 



PCT/US2005/008977 



gaagagctct gagcagcctt tagacatgat acctctagcc aggtccacca ctgaagcgaa 198 00 
taggggcctc tgggcttctc ttgctgcctg acacttccta tgaagaagaa aaaagcccag 19860 
ctcatgattc tgtcgcataa atatctctgt gaacttcctg tctctgctga gccggcagtg 19920 
ttggggataa ggactgtgaa aactggggta ctgtgggttt tcatgcacgt tcagaagaag 19980 
agaggctttg agatggtctt tcccacagac gccaaggtta ctcttccaga cagaaggaaa 20040 

ccccactagc cagagcctgt cgtgatctgt gtagatcaga cgcgtgactc aagttcctgg 2 0100 

tccctggtga tgtctgtcac attgctctgg gcagttcggc atctggagaa ggaacaaaac 20160 

cacgttttca gagctggccg gtcacaagcg tcggggaggg cgaacggttg agccgttgtt 2022 0 

atgaagttca ggaagcgggt ggcagctgtt attgttctaa gtggtcacca agtgaaataa 2 0280 

agattccgtc tgagcctctg ggcaaaggat gtggctcaaa cacacatttc ctgaggttgt 20340 

acaaggaaga atgcagaaca ggctcagaag ctagcatggc caatgtgaag acagggagac 20400 

ccctttgtgt cctggcaacc atattctggg gccctgtgtt ttaactttgc tttattttat 20460 

tttattttat ctgctghaga tataggtcta ttttctaagc ctctgactat ctcaggacat 20520 

cttgcagctg ggttcatatt taaaagggaa attcttgggg ccaagtttcc cttcagcatc 2 0580 

ccccagaggt gtcgttccat agcctttgaa aatgtagtat gaataagtcc caagcaccaa 20640 

cccaaggcca gctcgtgcca gttgtagtag agagacaggt attacctttt gctcgctgca 20700 

tcctttcctt ctgtggatcc tgctgagatt gatcaatttg caacacaggt ctttcaagac 20760 

tcaaaggttt ttcttgtttg actttttttt tttttttaag ccagttfcatc tgbtbtattt 20820 

ttctcagatt ctactgagaa agcccagtcc tgccctcatt tgccagcctt ctggaacact 20880 

cttactccca gaaacaaaca agcaaacaaa caagcacccc tcacctgccc cttctcctgt 20940 

gcgcctggag ggagagctaa ttgattctca ggtgtctggg ggtaggggtg tttacgctct 21000 

gccatctctg cctctaaatg gcttatcctt ttctccccat ctctgtgcac ccaagccgtc 21060 

ttgttgctaa ttctccatat cccagaatgt tatggtctgc tttgtttgtg ttttcttctc 21120 

ctgcatagca cagagctgac actaactgtt gataggctaa gcccagggaa cagtactcaa 21180 

ttgcagaggg gaaggactcc gtaggggcgt cttatcaaaa agtctttctt agaaacagac 2124 0 

acaagacaaa atggctccct ccctcccaac tagaatatga tcattggaat tgctgctgct 21300 

gtttcacaac cgtgagaaaa ggtgactgga gaatgggacc atgcagacag cagagcagaa 213 60 

agataaaatg aagccaggtc ttgaatggtg ctattggatt agctgcatcg aattaacaag 2142 0 

tctaggagct ggcctgccca tgggcttcct gttagtgaga tgactctttt gtgttcaagt 21480 

acatatagga ttctgaaaca cgtagctgaa agtggtccaa catacaaggt ttggtctatg 21540 

ctagggggga atctcaaaaa aggaaaaccg tacacttcag attgagcacc gagagtcact 21600 

gcacaggtcc acaggctgtg gtctgcaacc tcgatcacac ctctctcctt agtcctggcc 21660 

agtgcatgct gccgggattg gttctgtacc atccagacat ccctaacgat ttgtctctgg 2172 0 

actcaatatc ctattgctct cactcaaatc agaaaaagta caaaaatcaa tcagcatgct 21780 

agcgacctct gtgccctgga aagatgtctc catttctgtt tctccatggg atgatttagt 21840 

ttctcatttc agctttgccc tttgagagca ggtgtcattc agagttgtat tgagcagatg 21900 

agagtctttc. ctcggaggaa ataccatttc ccccagtgga accttgctgg atcactcata 21960 

acatagccga gggcgtgggt aaggtggctc ccaaggaaag tgctggccac acaaacatga 22 020 

ggatcagagt ctagaaaaaa gaagggctgc aatcccagtg cgggtggggg gggggggggc 22 080 

agaaatagtt ggatcccggg agctccctgg ccggctaggc tagcccaatt ggtaagctct 22140 

tggctcagtg agaccctgtc tcaaaaagta agacggaagg gctggagaga ttggtgatcc 222 00 

agcagctgag agcactggct gctcttcctg atgaccagaa cccacgtggt ggctcacagc 22260 

catctgtatt tctaagttca aaggctccag taccctcttc ttgttgcctt gctagggagc 22320 

aagctcacaa agcacataga cacatgcagg caaaacgccc atccgcactt aaaaattcaa 223 80 

ataaaagaaa atgtaagtgg acgataactg aagtcaccct gcatttgcct ctggcctcta 22440 

catacgcatg cacacatgtg tatgcgcaca cacacacaca cacacacaca cacacacaca 2 250 0 

tgcacacaca agaacagact cctcaccctc cacctgccag gtgattattc aattttatgc 22560 

catggtcctt atttattttt tgaggttctg aaacccttct cagcccacag tgagaatcca 22 620 

gtagctatca atggattcgg tctgcttgca tcttgagaag acagaagagc tgcgggtccc 22680 

actggacagc aggatbtgct ctgccctgct gcagttgtct ccacagtgct gttactggaa 22740 

tccatgcctt ctggcacacg cttttctccc ggtggtcaag ttcttccgtg ccctgagaag 22800 

acttctctcc ccagtgactc actcttgctg gaactacctc ctgcatttca gtactgagat 22 860 

tccaactatg ccaccaagct aacctcagtg cctagtcact ctctactgac cctgttacca 22920 

gcagctcttt catctttcct tcctgtgggt tctctctcct gggtcctcac gacctcatgg 22980 

ctcaaggcta tggggtcctg tccaggtggt tactcccgaa gccatgtttt cagagaaggc 23 040 

tcttagccag gtcttcggat cctagcaccc aggatgctgt gtgcgttctc ttaaagatgc 23100 

catctgtgct tctgaacctg tctcctcctc aggagtgctg acgcactgtg gcatggcttt 23160 

cttacagagg tttaaggact gaaagaggtg gccaaataag cacagagtat agcataccac 23220 

tgtaaccagt gcatgatggg gcaaataaac cagaagtcca gaggctggtt agctgccacg 23280 

ccaacacagg gtgatccagg caaggctggg gacaccagag gcagaaaaga gcgagacagt 23340 
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ggtgagactt 
aggtggaact 
cacgaacaaa 
ct 9gggcgga 
tggctggtac 

a gggtcccat 

ggaggcaggg 
atggaaacaa 
caatgtcgct 
acccctctcc 
gttgccagta 
ttggatgcca 
gagctctgca 
aatggtggtc 
tggtatcctt 
acagcttatt 
tcattattct 
atctccatcc 
tgaagtgcca 
gctctgcatt 
aggtactcaa 
acatgaaaag 
aggtgccttt 
caaagtgagt 
aaaaaaaaaa 
gcctgggatg 
gggctgcccc 
gtctgcctgc 
gaagctagaa 
ggctctggca 
■•aagtggtccc 
. gtctgcctcc 
accctaagtg 
aagacagatt 
ttccccggtt 
aggagaaggc 
taagcccagc 
ggctcagtga 
gaaatgattg 
acggagaaag 
ctatagtctt 
ggtgctgagt 
ggaatcaaaa 
gggaaagtat 
tggtacctta 
tcatggagga 
cagaaagcac 
gaagagggac 
ttcactctga 
cttaccacct 
acaaagctat 
cacacacaca 
ttattgtaca 
atatctctta 
tggataaaat 
atcccttttc 
ttatttttcg 
tctttctgct 
aatattttta 
tacttgttgc 



aagaaccacc 
tgaaccaaga 
gtgggagacc 
gtccttctac 
ctggaccact 
gacttcacct 
acttgggata 
gaccagtgag 
ttcagagcct 
actgtaccct 
gcaacataac 
gaaatccaaa 
ccatttctga 
aatatgtctc 
ttgcatctca 
cacaggttct 
gcctcccaca 
tgagaccaga 
gaaagccacc 
aggtactcag 
taatacttga 
gaaggagtat 
aggacctctc 
tccaggacag 
aaaaaaaaaa 
atgccagggt 
ttgaggcagc 
tgtcactcct 
gggagggcca 
acatggaaat 
aaatgctgtg 
ttccttttct 
agtagtactc 
ttgcttgttc 
gtgtgtacga 
tgatatttcc 
tttcactatc 
ttctcttaag 
cacaggaaga 
caagctgagg 
gctatagatg 
caacatgtgc 
ataccaaaca 
cagtgtgaat 
aaaatggcgt 
gagaactgca 
ataatggtgc 
aggaggagcc 
caagaggtct 
gagttttgat 
tctctaacct 
cacacgcttt 
tttggtcaat 
gctcccacca 
ttgcatcctc 
acatatgttt 
tttgttcatt 
tctgcctcca 
tttaaaaaaa 
tctttcagag 



aaagtctctg 
ttgagtatcc 
ctagattctg 
tttggaagaa 
ggccatgtct 
agggtcttgt 
cactatttca 
atggctctcc 
cttttccagg 
gttctgtttt 
ccagtagtat 
aatggtcttc 
agacttcaga 
acagtgccaa 
gagcagaaca 
gagatgggga 
gcagctatgt 
agatggatcc 
tacactcatt 
tgcagataat 
tacatccaga 
tttttgtgtt 
tgcacaggcg 
ccaagactat 
aaaataggac 
cgtggagcat 
tcctctctgg 
gtctcccaga 
ctcggggatg 
ctagattgct 
tctaaattct 
ttttcttggg 
taccactgac 
gaagtgggga 
ggagaaggct 
tgtcaccaac 
tccagtcctg 
ataaaggaag 
gcaaggggtg 
gatccatgtg 
ctatatagtc 
tgtctctgct 
caagtaaaat 
acacttctct 
tgaagcatgg 
aaaaacttgg 
ttggttggta 
ctggagctta 
ggctgcttga 
ctctaggatc 
acatacatcc 
ctctagtaag 
tcacctcgcc 
agtcatcttt 
agtattttgg 
tcatcatttg 
tgtttgagat 
gtgctggaat 
aaatcttttg 
gatgtatgtt 



gagtttggag 
agggggtcct 
tatctaggaa 
ccaggaaggc 
aagagagacc 
tgccattgag 
tttaatcctg 
agacccacct 
cctccccatg 
cctaacatcc 
acagtcatac 
actggctgtg 
aggaaatctt 
cagaatggcc 
tgagcaagcc 
catgaacatc 
tcatgaccta 
aagagaactg 
aggtactcaa 
gtaaatacta 
taccagttag 
tccaaatagc 
gatttctgag 
acagagaaac 
ctctctgcag 
ctggagcaag 
gctggctccc 
gtaacttgag 
taattaagga 
catgttccca 
tagcagggac 
atgccaagaa 
ctacacccca 
gggggaaatg 
tccttaaagt 
atcttctgtc 
aaggacacct 
agtgggtgga 
gggagattaa 
ggcttttgtg 
ctgctatata 
tttttttttc 
cccctcacct 
caaatgcagc 
tgtgggctct 
caacaggggc 
tgaatatgat 
cccagcctaa 
ctcggggata 
cacataaagg 
tacacacaca 
aaagagacct 
taaaagctca 
gtctcagtat 
tttgcatttc 
cattcctttt 
agagtctcat 
aacaggcgtg 
gggcgggaga 
tgattcccag 



gaaggcagct 
gactggcttc 
aaattgctgg 
agaggcaggg 
tgaggacata 
tctgtttagt 
ggaagaggaa 
gtggctgacc 
ggaaaccatg 
cttagcacta 
acattgatta 
ctaaaatcca 
cctctgtcct 
ctgtgattat 
tagctcactc 
tatagaaccg 
tccattcagc 
agaccttgcc 
taagtagtga 
gatatatcca 
gttctcaata 
cctgtccatg 
ttcgaggcca 
cctgtctcga 
cagccagtaa 
gctaatcaga 
agggccaagc 
ttttacggaa 
taaaaatatg 
ggactgtgtt 
.catttaaaag 
ttcaacccaa 
atgccccaag 
catatggtaa 
tagctggcaa 
cagaaacaca 
cgccctcact 
ataatcttga 
tgtggcctgc 
acacccggga 
gatgtggcta 
tttaaggcaa 
ggaggccagt 
tgtagaatct 
cagaggagtt 
tggggcagtg 
taaattccca 
caaaatattc 
aaggcacctt 
tggaaagaga 
cacacacaca 
acatgaagga 
tcaagtgagc 
ctgtaagttt 
ctttattaaa 
cttataattt 
ttgtctcagg 
tttctcccat 
aatgtctcag 
cacccctgtg 



ggaatgccta 
caatcttggc 
cacctgctca 
gatagggtca 
ggggtcagag 
gagcctccat 
tggtactcta 
ccgccacccc 
tttgcccaac 
tctacaacta 
tcttgtattt 
gatatgagca 
tcaagccagc 
cctgattaat 
atgctggatg 
gatggggaaa 
tatcctacct 
tgtcttgtac 
gcaaatgcag 
ggcacccatt 
aatactggat 
ctatttgcca 
gcctggtcta 
aaaaccaaaa 
gactcgaatg 
cagttgccct 
acatccactt 
agtaaagaga 
ccatgtcctg 
gtctgtatcc 
ttatgtgtgt 
ggccttacat 
agtacatcta 
gaagatgaac 
acatgcaagt 
gcctaaccaa 
gcctgcccag 
cctcaagatg 
gaaaaaaaaa 
ccactgcaca 
ccatctgaga 
gtagaaagca 
ttctcttcct 
gtaatgtctt 
ctttgaaggt 
gtgtggtccc 
gcactgggtg 
tttaaaaaaa 
ccacactatc 
gaaccactcc 
cacacacaca 
atattcattt 
ctcccccacg 
cctcaagcaa 
aataacaata 
gcttatttat 
ttggctctga 
gctcagattt 
cagctagaag 
gcagcttata 



23400 

23460 

23520 

23580 

23640 

23700 

23760 

23820 

23880 

23940 

24000 

24060 

24120 

24180 

24240 

24300 

24360 

24420 

24480 

24540 

24600 

24660 

24720 

24780 

24840 

24900 

24960 

25020 

25080 

25140 

25200 

25260 

25320 

25380 

25440 

25500 

25560 

25620 

25680 

25740 

25800 

25860 

25920 

25980 

26040 

26100 

26160 

26220 

26280 

26340 

26400 

26460 

26520 

26580 

26640 

26700 

26760 

26820 

26880 

26940 
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atcatccatt 
ggacatccaa 
ttgccaggta 
aactctgagt 
caaagaaacc 
tttatttgat 
gcgtgtgggt 
nnnnnnnnnn 
nnnnnnnnnn 
ccacagccac 
tgccctccct 
tcatctctct 
cttgtgaaat 
cctggctgtc 
gcctctgcct 
tttaaacaga 
ggagccggcc 
aatcccagta 
aatagggaat 
tcattttagc 
attattgggg 
agtgctttat 
agcagcctcc 
atgctaacgt 
ttgattagtc 
ggcctgtttc 
ggaagaggaa 
cagtccactg 
acaccaggaa 
gacatctgac 
ctagccacca 
cttaggagcc 
cagcttgacc 
ttatcttcct 
taaacttccc 
tcacccagaa 
tgtgaacagc 
gagctggtct 
tgcccaaatc 
gccatcaagg 
tgagaaacag 
agctcctttg 
cctaccatct 
gaagggggag 
accaggggag 
gtcaataacc 
gtgtgtgtgc 
tgcataaatg 
ggggacaaca 
tgtgtgaaaa 
agatttaata 
gcacacccac 
gcagattgcc 
tgcagagtaa 
ttaggcctcc 
atggagcttg 
aggctagact 
ggttectcta 
gtctggggcg 
cagagaacac 



tcaggggacc 
acacataaaa 
atggtggtac 
tcaaagccag 
ctacctctga 
tagtttactt 
gtgttgttat 
nnnnnnnnnn 
nnnnnnnnnn 



cacagccacc 
gctctctatg 
agcctgtctt 
tttttgcttt 
ctggacctca 
tccgagtgct 
ggctgggata 
ctgtgatcag 
cttgggaggc 
ttgcagccag 
ttatcagttt 
ggaaaatagt 
tttgacattt 
gatagtaaac 
cccacttcct 
tctacttgca 
ggatttctgc 
gtgggtaaat 
gggacttctt 
ctctcagggg 
tcctaggacc 
ttctgcatgt 
cagtgctgca 
caccctacct 
tgtgtgcact 
attgtctggt 

gggtggagat 

tgatggcatg 
tttttctggg 
tacatagatg 
cttagcactg 
ttcttaaaat 
cagggagagg 
caccccagga 
ggacttgtgc 
gcagtcagtt 
tctcatctat 
atgcatgcat 
agagataaaa 
gagggtcata 
cctcacaata 
aagagcctaa 
tgtcccctag 
ccgacctgaa 
gccttctcac 
cctcccccaa 
ggcgtcacag 
gtgtcttgct 
aaggctggtg 
tgccatgctg 
tttgttcttg 



cactaggcac 
taaacaaatc 
atgcctttca 
cctggtctac 
cctttgtcct 
tttcttttaa 
tgttgttata 
nnnnnnnnnn 
nnnnnnnnnn 
accctgctga 
tagttctggg 
tttttccaga 
gttttgtttt 
ctttgfcagac 
gggattaaag 
tagctcagtt 
tcctcagcat 
agggctggga 
cctgtgcaac 
ttctcctgtg 
attttttatt 
gctttaaaat 
agtcccccct 
ctcagatgct 
ataatgcagc 
tccagaagga 
aagatggggg 
cttaggtaaa 
ataacctgtg 
attcctaccc 
ccaccctaac 
gagttcactt 
ctgggaaggg 
ttttcctacc 
ctagcatatt 
gcctcctagc 
gagagacgta 
tggtcctcaa 
agcttgatgg 
ttggttatct 
taacagattt 

gggggggggc 

ctgttctgct 
ggtcccatca 
tctccagtgg 
gctctcttat 
gcgcttgctc 
gaggggatta 
ggaggtaaat 
aagctgtttt 
tccttaaaac 
accgctacat 
accttaccca 
ccattgtgaa 
ggcccagttc 
cagcaggtgt 
cagttccttg 
tttatcaaag 
aggagctgat 
tccctttccg 



atatgtagtg 
taaaaaacct 
tcccagcact 
agagtgagtt 
cccaatatac 

a gggttttcc 

tatgtgtgtg 



nnnnnnnnnn 
caaggctcgg 
gatctgaact 
ttatattatt 
gtttttcgag 
caggctgacc 
gcgtgcacca 
ggtacttgcc 
cataaaagca 
cacacagaag 
acgggcttct 
gattctagat 
ttctaacact 
ggggactaca 
ttcctgccga 
ggatgtagac 
gatgcacatt 
gactataaat 
tcagcatagg 
gctgaatgct 
catcccacaa 
cagctggttc 
ctatgatgta 
cctctcccat 
aaggcccttt 
tcgatgcaga 
cagatcaccc 
tgactggcaa 
tgtctgttct 
cagctgatga 
tggcaggcag 
gacacatggt 
tttttttcca 
tccaatgagc 
tctgatacag 
ccccctgaag 
tgtagctact 
gaacaaccca 
tcatgtgcac 
gaaacagaaa 
gtgaacaaaa 
ttctgtataa 
catgtttcac 
gctttgcatt 
agacgcacat 
agtcacaact 
tcttgctcag 
aactgcatct 
gagcagcaag 
tctacctaac 
ggacccatgc 
gctcttctca 



cacatacatt 
ttctaaataa 
caggaaggaa 
ccaggacagg 
atatatttag 
gtgtgtgtgc 
gtatgtgtan 
nnnnnnnnnn 
nnnnaccacc 
gtcacatgca 
cagaaaattt 
atttcacaag 
acagggtttc 
tcgaactcag 
ccacgcccgg 
taatgcttgc 
gcatggtagc 
atcagggtca 
gttataaaac 
cttgtgactc 
tttatcatgt 
actcttcttc 
ctcacggttc 
tcctcctcct 
ggtagcagtg 
caagaccagg 
tcctgtatga 
aggtgagtgg 
tatgctcctt 
ttcctctcac 
ctctcccttt 
tcagcactct 
tatttacact 
gccccagctt 
agggtctgct 
ccctgtaagc 
gacttgactg 
atgaatctca 
gtggctgctg 
tttctttaag 
cataattatt 
ttgggcttgt 
aaaaggatct 
atctgtcggt 
ggccaggcat 
aactaaactc 
acgcgtgcgc 
gagatcagca 
tgtgtgtgtg 
ttaatatata 
tgactggatt 
ttctcataga 
aaccccacca 
tccccgatct 
aggtctcttc 
gtttgcctgc 
aggcgtcaga 
tgaacttctg 
tatcatgaca 
gcctcagaca 



tgtgcaaata 
aaatatgttc 
aggcaggtag 
caaggctaca 
taaactaatt 
gggagcgtgt 
nnnnnnnnnn 
nnnnnnnnnn 
accaccacca 
cacaccacta 
ataaactaaa 
tctatcctga 
tctgtatagc 
aaatctgcct 
ccctgaaatt 
tagaacacag 
acacacctgg 
ttttcagcta 
aacaacaaaa 
accaacctta 
ttctatgtca 
ccggtgactc 
ctcagcccat 
gttccactta 
acattcttat 
gcagcaaagg 
aggttccagc 
tcagaaccag 
aaaggagaaa 
aaggagagtt 
.cccaaaggac 
cctaagagct 
gcgcttactc 
ggttttctca 
tgtgcttctg 
ccacgtctgg 
ggatctgact 
caagatgctt 
tgatgttgta 
tatggaccag 
tgaagttctt 
tccctccttg 
tcttatgggg 
ggttaacaat 
ccatgctcct 
atcggcatat- 
gajacccacac 
ggtatgtgag 
tgtgtgtgtg 
taaatctaaa 
ggtccccatt 
ccctgctttg 
aactgtgaac 
gcccttccct 
cttccctggg 
cttggatctc 
ggtaactttc 
gcctccagag 
gcttgccaag 
atcccaccac 



27000 
27060 
27120 
27180 
27240 
27300 
27360 
27420 
27480 
27540 
27600 
27660 
27720 
27780 
27840 
27900 
27960 
28020 
28080 
28140 
28200 
28260 
28320 
28380 
28440 
28500 
28560 
• 28620 
28680 
28740 
'28800 
28860 
28920 
28980 
2 904 0 
29100 
29160 
29220 
29280 
29340 
29400 
29460 
29520 
29580 
29640 
29700 
29760 
29820 
29880 
29940 
30000 
30060 
30120 
30180 
30240 
30300 
30360 
30420 
30480 
30540 
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ccaccacggc 
aaaatctttg 
tttttgtgac 
cagaaggaaa 
ggctgatgtc 
aggaaaacaa 
gcccactaga 
agtgtgatat 
cttgatcaga 
caaaagcaag 
attgaaggaa 
cagaggccac 
cttatagaac 
cccccatcaa 
gagtcatgtt 
caattcccag 
gctgggtaga 
ttgtcacttg 
tttctataat 

cgtggttttc 

atagactgta 
cagagctctt 
ccagtattta 
attgaagttc 
cacatacccc 
ggtcttttca 
tttatcacct 
tgtgatcaaa 
ctgcccttag 
agcagatcgt 
tatcacagcg 
aaggtttact 
gtgggaccac 
Ctgtgttcag 
ccccatgaga 
tggaaattcc 
aatgtggcaa 
ttgacatttg 
taaatggaaa 
ccttctcctt 
tcttcttctt 
tctcctcctc 
nnnnnnnnnn 
nnnnnnnnnn 
taaccagccc 
actcaaacca 
ttttcaagtt 
cagaaaaatg 
gggtctccac 
tgaattttcc 
gactgtgttt 
aggcagctgc 
atgatcttta 
gtacatgtgt 
gcattacata 
catgtaaaca 
taatttattt 
tttttctctc 
catattgcac 
tcactatctc 



aagctggaaa 
tccttgatga 
cactgtaatt 
accaggtata 
actccctatg 
tgcccccaat 
atggctgctt 
aaactcaaat 
gcctcatttt 
ttgaggagga 
gccaggacag 
ggaggtgctg 
ccaggaccac 
ccactaatcg 
gtagacaaca 
ataaaagaca 
tatctaccct 
ccatgttcca 
tcacctaccc 
ctccaagcct 
ggcatcttta 
gggtctatgt 
gcattattat 
cctcctctcc 
tgtacctcag 
ggccctaaac 
ttggggaggg 
cttcccattt 
cagatgactt 
ataatcctct 
ttcacactcc 
tggggtcatg 
aaggcttgcc 
ctgcctttct 
aatgctgcta 
ttgacaaacc 
tgaagattaa 
taaattctag 
caacttcaac 
ctccttctcc 
cttcttcttc 
ctcctcctcc 
nnnnnnnnnn 
nnnnnnnnac 
taaatttttt 
tgaatgggaa 
gtcagtttga 
agtgctagcc 
acaggctgct 
atcaccttct 
atgttctcag 
aggttttata 
taaatttgac 
atgtgcatgt 
aactgattct 
actatgatat 
tgaaagtaaa 
tcatcttccc 
tgctgttttt 
attcatagcc 



cttcctgttt 
atgctccttg 
tcttcaggat 
gacaggaagc 
acacatgagg 
cttctgccag 
gggattgaag 
tggaagagct 
cctagttggg 
aaggacttat 
gaacttcaac 
cttactggct 
cagcccaggg 
agaaaatgcc 
tttaatctca 
cacaacatga 
ctaaaccatc 
tctgggccac 
catggtaact 
gggaactccc 
ttcaccaatc 
gcagattctc 
acctagcaaa 
aatgactcta 
aggggtcagg 
ccttactcag 
attttttttt 
ctctgagctt 
accccgggct 
tccacagaca 
aacagcaaaa 
gttgagagga 
ttactgtatc 
cctctttcat 
cctgcattca 
tgttgagaag 
ctacaaaaca 
caatgctagg 
agtagcaact 
ttctccttct 
ttcttcttct 
tcctcctcct 
nnnnnnnnnn 
tttaaatttg 
acaactttct 
cattgtatat 
gagccctgac 
cactgcccct 
ccaaaacccc 
tcacctccat 
tgtatccaaa 
cagaaacctc 
acttttaaag 
gtgcacacac 
aataatgctt 
aggccatttc 
ttgacttagg 
actgagcgtt 
ctgaattatt 
atattcagcc 



tggtctgggc 
ccctctgcat 
tcaatgttgc 
ccagtccgta 
tgacaaacat 
tgtctgtggt 
tcagtggaac 
gacaaagtcc 
gtttctgttg 
ttggcttaca 
aggacaggag 
tgtttctcat 
atgaccccac 
ttataggatt 
gtctgggttt 
ttatttataa 
tgaatctact 
tcttaactcc 
tctccacatt 
aatccctgcc 
agggataact 
ttgtccctgg 
agaccaaacc 
atcttgtgcc 
caagttcaca 
agtcatttta 
ttcccaagac 
cggtttatgc 
tccagcagca 
aggagagtag 
tgccccaatt 
tatggtccat 
cacagtcagg 
tttttttttt 
gatggtaaat 
tttgtctagfc 
gtagagcctc 
tctatgttcc 
cctcctcctt 
ccttctcctt 
tcttcttctt 
cctcccgcnn 
nnnnnnnnnn 
taactactta 
acaaatggtg 
ttaaactgta 
atgggcaagc 
accttcatct 
acccaaacta 
ccactcacac 
ttattacagc 
taaaagtctg 
aagatctctt 
ttgcacgttt 
ttcataaccc 
cagcctggtc 
aaaaatttta 
ttgacgtata 
ggagaggaag 
atcaaaatta 



ctttgccatc 
tgccccagaa 
cttctaatcc 
ggagatgatg 
ctgagagcta 
gtcaatattc 
catgggtaat 
tgactctaga 
tcatggagaa 
cttccacatc 
cctggagaca 
ggcttgctca 
ccacaatggg 
gcttacagcc 
ctaccctgtc 
taagctttta 
tccctaccca 
aagtggccag 
ccatcttctc 
tgtctcaatt 
tgggggagga 
gggcaaccag 
tccacagagg 
aagttaactt 
gtcacttcca 
attcagtatt 
ctcttttaca 
ctctgaaaag 
actaactctg 
tgtagcctga 
agaagcaact 
catggtgggc 
aagctgaagg 
tttatagagt 
ctccctcagt 
aggggattct 
gaattaactt 
ccacaggcta 
ctccttctcc 
ctccttctcc 
cttcttcttc 
nnnnnnnnnn 
nnnnnnnnnn 
tttgacaggg 
cttttggcct 
gcaaccctct 
taacagaata 
ccattgggaa 
ccttctcagc 
agaacgaccg 
tgagcactgc 
taaacctggg 
tgaaaaacct 
acatgctaaa 
atgttgaaat 
ttttttttta 
aaatagtaca 
atacactgat 
ggtctgacag 
ggacattaat 



tttaagttgg 
gttattaacc 

tggggtcatg 

ctgttagaat 
ttctcctata 
ccaataagac 
aaccctggac 
ctttgagcat 
acaccattac 
acagttcctc 
ggagctgatg 
gtctgctttt 
cttggctctt 
tgattttatg 
tttgatcatt 
atgcactaga 
taaccccgag 
catggccatg 
tctttcctct 
ctgcccagct 
gacaaggtta 
gccttgggga 
cattttctca 
aaaaatagcc 
cctgacttcc 
ttcttttctt 
gacttgactg 
tggaaggcaa 
ctctcggcca 
gagcgttaag 
taagggaagt 
aagcaggata 
agatggacac 
ctagacctcg 
taatcttctc 
aagtcctgtt 
ctcttgacat 
taattagaac 
ttctccttct 
ttctccttct 
ttcttcttct 
nnnnnnnnnn 
nnnnnnnnnn 
tctcattctg 
gggtataagc 
gaggctttgg 
ggactgtgcc 
ggaagatctg 
ccaacaacct 
gacactctgt 
agatgaccca 
atggttttat 
ttccctccat 
aaaatgatct 
aatttttttg 
aataaataaa 
aagaattgca 
cattctctct 
taccctatta 
gctggcctac 



30600 

30660 

30720 

30780 

30840 

30900 

30960 

31020 

31080 

31140 

31200 

31260 

31320 

31380 

3144 0. 

31500 

31560 

31620 

31680 

31740 

31800 

31860 

31920 

31980 

32040 

32100 

32160 

32220 

32280 . 

32340 

32400 

32460 

32520 

32580 

32640 

32700 

32760. 

32820 

32880 

32940 

33000 

33060 

33120 

33180 

33240 

33300 

33360 

33420 

33480 

33540 

33600 

33660 

33720 

33780 

33840 

33900 

33960 

34020 

34080 

34140 
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cgtccactcc catagctcca ctcaaatgcc tccagcagtc ttagcagcaa tataaaacat 34200 

cctatttgtg taactatgtg aactataatt tatacacata aaataacttt cctcctggac 34260 

caatgtttag tcgccaaacc tgggggtgct gttgtgttgt gcctfcccttc atgatattga 34320 

cttttttttt tgtttttgag aacttggtag taattttgta ggctgcttct tggtttgagt 343 8 0 

tcttcagggc tttcctcgtg gttgggtcca gttctgcatt tccatttcca gcaggacttc 34440 

tcagtggtgg tgctaaactt actactcctg ctcaaggtga catgccggct gctctgcgca 345 00 

gttgttagag tacgctctga gtgtgaagct aagataacat ctgaaggggt ttctctttgc 34560 

aaaattagat atttgttatt tggggtctta atattccctc cctctctccc tccgtccctc 34620 

cgtccctctc tccctctgct taggattgaa accagagtct cacactattc tgttcttggc 34680 

cactgatcta tcttcacacc ctcaggaatt cagggtcgtc ccaggcatca tagtaagtcc 34740 

aaggccagct tagaccacac gagtccctgt caaaaagaaa ataaatgact gcagtcaagg 34800 

atagcagcac agcttgaaat cccagcactc agaagcacag gcaagaggat caggggtttg 34860 

agtccagctt cggctgcata ataaaaccct gtctcaaaat aaacaaataa aagactgatg 34920 

actaggctga tagctataaa caacttgaat gtcaactgcc tagcataccc aaggctctga 34980 

attcaacccc taacaaagca aataaaataa aactagaaaa ttatacacac acacctgttt 35040 

tgtggaaata atccgagact atctaaatgt gtcatctctc aaactctcac ccacctggtc 35100 

ttagcatccc ctggtgaagt gcgtctgcat cagtagttac catgatagct gccaaatgct 35160 

gactgcctct tctatcattg gtcccacagt tattaattga catttcgctg tagggaaagc 35220 

ttcccttctg atttatttat gtaaggataa agttgtgaat tctcatttta cccagtggat 3 52 80 

tgtaactatc ttcattattc atttcaatgt tcagagtgta ctggctgctc ctcctgctac 35340 

acttggtggt tcctggctct ttttgacaaa ttgtcttccc gttcaaagca tggcttttct 35400 

ccctagaacc tactagtgca gtggcccttt aatatagttc ttcatgctgt ggtgaccccc 35460 

aatcataaaa ctaatttcat tgctacttca ttaactataa ttttgctact gttatacatc 35520 

agaaggtaaa tatttttgaa gatagaggtt tgccaaaggg gtcacaacca atagtttgag 35580 

aacctctact ctagaaatat atatctcagg ttctgctgtg ccttcccagc cccagattgt 35640 

aggctcttag ctgaatttaa gaacctctgg gtctttgtaa tggggaaagc acccaagatg 357 00 

tgaacagtgc tcattgctgt tagtctctgc cctctcagca gacagaacca agaaggtggg 35760 

ggaggaagtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtgtgtg tatacaattg 35 82 0 

atactgttac ttaccattca aatgccaaac aacagaattg ctctgcaact ccctccctag 358 80 

cctggcttgc atgcttatgc ttgttcctta gcaggtnnnn nnnnnnnnnn nnnnnnnnnn 3 5940 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 3 6000 

nnnnnnnnnn nnnnnnacca tgaccccagc tcggaagttg gctctagcta cttcaaaggc 36060 

gacgaaacaa tgaaactgtg ctaataacca gaggaaaggg agctgggcat ggtgggtcat 36120 

gcctataatc ccagtacttg ggaggctgag gcaggaggat ctcaagttcc actctagcct 3 6180 

gagctatatg gttccagagc aaaatccttt ctcagaggaa gaaggaaggg ggtgagctgt 36240 

cttctttggg gatatccttt agagaaacca ctctcagccc tagaactaaa gacaggtttc 3 63 00 

tgagtctaaa gcagtccccc aaactattct cagcccaagc tgcccctttc cccaggcttc 3 63 60 

tgggtctctc tagctggcca cgcctcctca gtctattagt gttctagctg taccagatcc 36420 

ctgtccctgc ccctttcccc tttcttccac cctacctccc tcattcaagt acagactctt 36480 

accctgtact gctgccaaag tgatgtcatc ctgaccagca gtggtttgca cctgtaatcg 36540 

ccacactcag atgatgagtt tgagtcaagt ctgggctata atgtaagaac ttgaatcaat 36600 

caataaataa aagtt-aaaat atcaatccta catgaatgca caccccagct cttgagtgct 36660 

cccaaaagct ttcaaaatta ttgttaaaaa taatagtaag ttctacaaaa tcccagaggc 3 672 0 

cccagaaatc atgcgctctc cccttcaccc acatgaagtc tttttggcca aggcagagga 3 6780 

ggtgagtaga cattgagaag aaggcagacg gaaggtagag gaaacttcct gatgactgca 36840 

gagataggag agggcagagg ggcctgtgga gagagactca ggagaagctg ggtcctagac 36900 

acactgataa ctgcataggg agatggcaga ccatgctatc taaaacacaa aggcctggtc 36960 

aggccagctg aactcctgag ctgcccctgc tgccaccccc cttccacagt cccccttccc 37020 

cttcccctgc cccctgccca gcacagctct cagatcttct gtgtaagttg tatctgtgga 37080 

acacccccaa gcttggactg agtctagccc ctgcctgtcc ctcaatcacc gtcttttcaa 37140 

ctggcagact cagggtcctc actaagaagc agaaataata aacacgatca ctggcacccc 37200 

tctgcccgca gacaaatgtg ttttcaccct gctttcaccc tgctgcctaa gccttgtctt 3 72 60 

ctgactttct ccttggacct gaagagggtg ctttgatttg gtgtgaatct gcatggagct 3 732 0 

ctttggtttg ggttccccac ttcccacttc ccaccatcta taaaatctcc atggtaactg 3 73 80 

aatgggccgg cctcagaatg gctaagtagc taaagatcgc cttgaatccc tgatcctcgg 37440 

tctctgcctc ccaagggctg gcatcttggg cctgcagtcc tgctctggga agtcttttca 37500 

acctaaaagg aaggctaccc caaccttcag gagtggccac ccggagcacc acacttctgc 37560 

tcagcggctg gcagacttcc tcccctgtgc ccagtctatt tcagttctgc atttagagat 37620 

gtcagacagt gctgacatta aaactatccc ctgggatgtt tcagcttgtc tagggctctg 37680 

accgcccagg cctgctcttt cctgctcccc acccccaccc tgcccccagg gtattttcag 37740 
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tcatccccag 
atgaccttag 
cagcctgaag 
ctgggatatt 
agatctagct 
nimniuinnnn 
nnnnnnnnnn 
ggctgtggtt 
ttttcaaggc 
ggcagagtga 
fcaaaaaacat 
atatttcatt 
tctgaacata 
tcagcctcaa 
ggctgagggt 
ttcaggttct 
gagagaagga 
acggtccatg 
acagctcaga 
atttcaaaat 
agctggcggc 
aactaggcct 
tgattgtcct 
gtaagacagc 
tcctgctggg 
gtaggggtaa 
actctcccca 
cctatgacac 
tgggattgtt 
agcccaagca 
agcaactaag 
tacggccctt 
ggtcccagtg 
ctacaaagtg 
aaaacaaaac 
ccataggcca 
actgtgtcaa 
gtaaccccac 
gagatgtagt 
gtgcaacatc 
tcctggccca 
tagtggggtt 
ggccttttaa 
tctctggcct 
aagacaggct 
tttgcttccc 
gagctgtccc 
cctttgcaag 
tgtctcatcc 
taaataaata 
gattaaagtc 
agccagacac 
tcctaggagc 
gcatggggta 
aagagccact 
cctattacaa 
taataagtgg 
agaggaaagc 
ctggaggagc 
ccggaagcct 



gactgcatct 
tcttaaaggt 
gccaaccaac 
gctccctggt 
gggcagtcca 
nnnnnnnnnn 
nnnnnnnnnn 
atgtaccgta 
tcttgtgaag 
gcfcgtgcatt 
tgcttctagt 
tcgtgaacag 
ataaattcat 
gcctgggata 
ggagcacagt 
ctctctcttg 
attcatgcct 
ggtattgtgg 
aggctgttag 
gacttggtgt 
ctagacgcta 
tgatctgagc 
ctggttcatt 
gagaggaaga 
agttctcagt 
actgcttccc 
aatgccagtt 
ctttcttgac 
gggagaaaag 
cacagaaggt 
ggggtggcgg 
tgtaacgggg 
cttgagagag 
agttcaagga 
aaaaaggcaa 
acctgatcta 
gttcacaatt 
agaatcaggg 
atgtagaccc 
tcatgtggaa 
cagtcagtac 
tattggctac 
gcccatctta 
gttgggaccc 
gcttcctgct 
ttaggagggc 
tgagccccag 
gacatttctc 
ccatctgtct 
agtggatctt 
caggcatgta 
ccgtgaatag 
aatgtgctga 
tcatatggcc 
gacaccatca 
aggctttacc 
gaagacagtc 
ctcccttctt 
ctgcttcccc 
gacccaccta 



gaaaaagcag 
accagtgccc 
cctccatacc 
agatgcagat 
ctcccctcca 
nnnnnnnnun 
nnnnnnnngt 
ccagtttccc 

ggggggggtt 

caaagcttga 
tactctgtgt 
aactaatcct 
gtccagtctg 
tcccagttac 
gaaagggctc 
gatcaaactg 
gccaagtgga 
ggtatcattc 
aggcccacag 
atcttttaag 
agaatccaga 
ctcatgttcc 
cactcctttg 
tagtgaagaa 
gggctccagc 
tccgctgtta 
ttatatggcc 
cacattttac 
ggggtacacc 
atctaaccta 
gaggaaaggt 
aggtgaaagt 
acagaggcag 
cagccagggc 
tactcctctc 
gacaattcct 
aaaactacaa 
gttgccaggc 
cgctcctgtg 

ggggacctct 

actgccacct 
acagagccgg 
tgaagacagc 
atatctccac 
ggcccacatg 
ccttgtatcc 
ctcctgggta 
tgtcaaatgg 
catcagttgg 
gttgctctga 
gcfctgaaggt 
aggcatacag 
cttctatcta 
agacagcaag 
cgtaaacaca 
tccaaacacc 
caaactatcc 
agcagcgccg 
atccttgaca 
ggtggtcagc 



ccagcagtct 
tccctcctct 
caaaaacctt 
gccaagagat 
cctccctgca 
nnmmnnnnn 
tttatggagc 
ctgtcaacag 
ggtaaagtag 
ctccttgggt 
agacatatta 
tactttattg 
ttgattaagc 
aagcattatt 
gggccacacc 
cctgccacat 
agaggagagt 
ctaaggggca 
ctaactcgag 
gaaggcctta 
gccacgcccc 
atgtgacaca 
agtttagcca 
gcgggggtgg 
actatccttc 
gctctagaac 
atctgcttcc 
attttaaaat 
aagccccacc 
gttttatttc 
ttgtttttgt 
ggtagccagg 
gtaaatctct 
tacacagaga 
ctattccctg 
cactgagacfc 
attcttgcct 
tcctcagaat 
gggagggggg 
aggctcatta 
agaagatctt 
ccattcacta 
attttatact 
catctcctct 
gcataacaat 
tttctgggta 
gtgaatctat 
acccagcacc 
ccttgtatat 
ctgagtacca 
aaagcaaatg 
cttatttggc 

gggaggacct 

ggtgagctga 
cacacacaca 
atgtacatat 
cacacttggg 
ctccaagctg 
tttgacctca 
catcctggcc 



gaccaccagc 
tcctcaaacc 
tgtgcatggc 
agaacaagat 
ccttttatnn 
nnnnnnnnnn 
agatgtatct 
acctggctat 
gaaccacagt 
tgttgtctat 
actaatcaga 
tcttgtgtgt 
aaatcccatg 
ttaaagtagg 
atgcaccaca 
gctatagtta 
caccggggag 
cagtgagggt 
acctttgggg 
gctgcagaat 
tctgctggca 
caggtgcttc 
gtgagagccc 
gtcctctggc 
tgctgggagg 
tcttcctctc. 
tgctaggact 
agtcgaaaaa 
tctcactgtt 
ctgttgctgt 
tttcactcat 
cagaggtggc 
gagttcaagg 
aaccctatcg 
cccccccaca 
ctcttcccag 
gtcattctga 
tacaggaagt 
cctcttaggc 
ctaaaggaca 
cccacgtgtt 
tccactctgc 
gtgcctccaa 
gcctctgtct 
tgaaatttat 
ctagaagtac 
attatagtgt 
cacctggcat 
gtgcatatfcc 
caaacagggt 
cctagcatga 
tcacagttct 
tcttgctgta 
gattgctttt 
cacacacaca 
gaatacaggg 
gagttgtgcc 
gctcatgctc 
accctacact 
tgccaggccc 



tgaccttcag 
tggacccccc 
tttgtacctt 
tttccctata 
nnnnnnnnnn 
nnnnnnnnnn 
cagaacagtg 
aagacagggt 
tatatcccaa 
aaaggcctta 
gacaaggact 
gactfcttgat 
gagaggtcta 
aagagcctga 
cccgctcatc 
atctgttaag 
caggggcctc 
acatcagtaa 
ttccttctct 
gaggaacaaa 
gggaactatg 
hctgccatca 
cagccaggtg 
actggctccc 
ttccaggggg 
tacctaccaa 
atgactaatg 
gcaccacctt 
tctacctcca 
aataacaaaa 
aattccacat 
gggtgctgtt 
ttagcatggt 
taaaaaaacc 
cacacacacg 
ataattctag 
ttccagtgtg 
acggtagacc 
aggtgaatat 
ttattgttcc 
ggtttaggct 
ctggacatag 
gagacccctt 
gctgcctgtg 
ctcatgggaa . 
aaagcactca 
ggccataagg 
ctctgcctcc 
attgcataca 
cacttttaaa 
gcaaggtcct 
ggatgtagga 
tcataacatg 
cctctgcaac 
cacacacaca 
attgcctttc 
tatgggaggc 
atgcatccac 
ccccatcttc 
tgctccctgg 



37800 
37860 
37920 
37980 
38040 
38100 
38160 
38220 
38280 
38340 
38400 
38460 
38520 
38580 
38640 
38700 
38760 
38820 
38880 
■ 38940 
39000 
39060 
39120 
39180 
39240 
39300 
39360 
39420 
39480 
39540 
39600 
39660 
39720 
39780 
39840 
39900 
39960 
40020 
40080 
40140 
40200 
40260 
40320 
40380 
40440 
40500 
40560 
40620 
40680 
40740 
40800 
40860 
40920 
40980 
41040 
41100 
41160 
41220 
41280 
41340 
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cctccatgac aatgcaagca tgagcagggg ccagggaggc atggctgact aacagctctc 41400 

agcagtcaca actgcctcca gctcctgcct gctccttttg gggatccctt tagcttctcc 41460 

ttcctctgct tgcttaatac acagcctatt cacccaccat tgtttacccc atccccactc 41520 

tgcctgccac ggccaccttg aacatgtcta caattcaata gtgcagcata gagcctcgtc 41580 

tctgggtttg gttctggctt tttttttttc ctgctgtata agataatgtg catctcagcc 4164 0 

tgcacgcccc ctaagcccca atcgcaccag tgttcacaca tgcagtcctt aggcactctc 41700 

cactccctga tgagactcct tccccagatt cactcaggag tcagcccatc tcaaatttct 41760 

ttgatatttt cattctcttt ccgaatgccc agtggcattc gaatcgattc gaatcatcga 41820 

ttctctctaa ccagccctga caagaccaca cacactttac ctgaagtgag ttcacaggtc 41880 

ttacatctta tcaaattatt cttgttattc cattcttgtt tgcatttgaa ccactcactg 41940 

gagaatattt tggaaatctt ggcaggggct gccactcagg tggtttaaca gtgctcacaa 42000 

gagattgagc agcgtggggt tttgaggaga caggggagtt tattgaaaag aataaacaga 42 060 

tgggtttgct aagctctcta taaatttact aagacaacct aagccataca gcatgagtgt 42120 

gtatatatgc tctctcacgc tctctctctc tctttgtcac acacacacac acacacatac 42180 

acacacacac gcacacgcac ggatttttgt gtgtaaaaca catgagcaca caagtacact 42240 

tggagagcgg gaagcacgca gagggaggcc tggagcccat atggacagcc cgagtccaca 423 00 

gagactgttc tcaccaaagg ccgggtgact cactacagtg tagaagaccc agaacccaag 42360 

gttcagagac aagtctgaga gcttcctagt tgtagcagct atttgtctca ttggtgtaac 42420 

aaaatacctt acaaaagcaa cttgaggaag aaacgagtta atttggctta cagttcaggg 424 80 

gatacagtct acagcaatgg ggatggcaca ggacgggaag cagaggccac tggccacacc 42540 

atgtccagag tcaggaaacc aggacgaacg ctggagtcag tttgttttct cctctttact 42600 

cagtcctgga cotcagacca tgggatgatg tcacctacat ttagggtggg ttttcccacc 42660 

tcagtaagcc ccacctagag aagcccaaag acttgtatct atggtgagcc ttaattccat 42720 

caaattggca agattcctga gtctgtccca ttcacctgag agcataccaa ccctcccaaa 42780 

tcaagcagtc aaaccatcag gaataggaat ctaatataga tgtgactgag tcaaccacct 42840 

gccttccctc ctgtgaggac tttctaactc aaacccatgc tcctattggt ctggtttacc 42900 

ccccaagcaa tgggtttctt tcagcaagaa ggtagctagc tggccctaaa gaaatggttc 42960 

agcctctgat ctctaaacct gaagatggac acaagtcatg ttaaaatgat tggggcagct 43020 

ggatgtggca gctcatgcct gtaatcccaa cacttaggga gctgagacag gaggcttgcc 43080 

atgagtttga ggccaacctg agctactatg tagaacattg tcttaaaata aaataagtta 4314 0 

aataatagtt agacaagagt gaactatggg cagacacttg gcccatgtgg agggatcatg 43200 

accccacaaa gccgcaggag actcactaga gagagtatgt atctcattag ctgcagcaat 432 60 

ttggtctctt gctctggtct ggctgtctgt agggatagcc atcttagcaa gggtatatct 43320 

aaatggatga aggggcatct tggtggggtc tgggctaact attcagggaa gaccttgaat 433 80 

ttcctactgg agctgcctct tccaccaaca tacaaaagag tcctggtccc agagtgtgag 43440 

agcagtagag cagagtcacc tccatcatca aaagcagagg cacaaactgg tgatgatccc 43500 

tgtgttgctt ggacaacttt ctgacttttg gctgaaaaaa aaaaaaaaaa caaacaaaca 43560 

agaaacatag ttaagggctg gatgggggac caagtggtta agagccttga ctgctcttgc 43 620 

agaggatcct gggttcagtt cccagcccca catggtggtt cacaacgatc tctaactcca 43680 

gtctaaagga tatgatgccc tcttgtggcc tccatgggca ctgcagtcac atagtacaca 43740 

aaacaatttt atattttaaa aaagtaacta agtctagcgt ggttactagt gcacaccttc 43800 

aatcccagga ctcaagagac aggaacagaa aaatctctgt gagttccagg ttagccaggg 43 860 

ttccatagtg agacccactc aaaaaaacaa acaaaaacat agaatttaaa caaatggaaa 43920 

aaaaagtaca agcatgaatt ttcctatgtt taaaagcaat aatcaagtgt ggtggcacaa 43980 

gcttgtaatc tttaagccct tgaggatcta atgcaggagg gtggttgcaa gcttgaggct 44040 

acgttgaatt atatagccag accccatttc aaaaagcctt aactaaatac agtaaacaag 44100 

ccacgtgtgg tagcacgtgt tcttcctcgg agggttgagc caggaggatg acatattcaa 44160 

gcttggtctg gatagtttag aggaaacctg cttcaaaatc aagtttaaaa cggcagactg 44220 

ttgatgcctg gtgacacaca tctgtcatcc cagcacccag gaggcagagg caggagatgt 44280 

aggacggatg tttccttagc ttgtgtacag tcctcgcttc aatccccggt gctgtaaaac 44340 

ataagtcatt aataaacaaa tagtcggggc ccggggtgtc actcacggta gggctatgtg 44400 

tgctttgcat gtgtgagaat ctgagctcta ccttcaatac caaaaataaa atataatttt 44460 

tagaaatggt cttggtgggg gggggagcaa aaatagctct gttcaccaca cacacaccta 44520 

aacttgtaaa aatagaactc ttggcccttt gagattatag gtgatttatt ttttttaaaa 44580 

aactggtatt ttctaatttg gatataataa aaaaatatat actaatttgt atagtaaagt 44640 

acttatagaa atcattttta acagatggag agaaaaaaat agttttaaaa atacttctat 44700 

ttgcaccttt gcaaagggct cctgggagcc actggctgcg ggatgtgagg cgtgagtgga 44760 

gagcttttct gggatatggt cactcctagt ctcttcagtc tctgctccag gtggcctttg 44 82 0 

tgcttgggtg acattctagc taaagtgtgc tctgtccggg gcttctgtcc tcaggcatcc 44880 

gaaggtgtcc ccatgaggtt cttcacgaag ctggaccagc tcatcgactt ttacaagaag 44940 
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gaaaacatgg 
attgatgagg 
ggtgacaaaa 
ggaggttgaa 
cagtagcaaa 
aacctgagga 
atgattggcc 
cggcttgttg 
aaagagatta 
cgcgtgaatg 
actatagtga 
catgtaccat 
aaaatttagg 
agacaccacc 
tttgtgaagg 
cctgtagcag 
gttacctcgt 
ttagacatgt 
caggtgtctt 
ccttcctgaa 
acagaggcac 
ggggactgcc 
accaaaattc 
tgtccttgac 
tctgtcactg 
agaatctgtt 
aaagccaagg 
gggcctgcag 
ggaatgtaca 
gctgaccaga 
atgtcccagt 
acaatctccc 
tctaaatttg 
agagaaggaa 
attttctgcc 
gatctgccaa 
accaactgac 
gccaggggct 
gtcagcatgg 
gggagcctgg 
cctcactgat 
agctgcctcc 
ttgcaacaga 
ttcagcgtct 
aagccggcct 
ggggagcagc 
caggactatc 
gtcacatttt 
ttaggtcttt 
ctgtggtcag 
ggaaatcact 
gggaggacct 
aggaaaagca 
ccaggatggc 
cttagaactc 
tacatggtct 
aaggagacaa 
tcttgcattc 
tccaacactt 
aggcttgata 



ggctggtgac 
ctgaggagga 
ggactctgtg 
ggatgctttc 
attacagtta 
ccgtattaaa 
gggaactcat 
cctttgtggc 
gataagagta 
gaacaccatc 
aagttgagag 
cactaaaggc 
atgccaaggc 
atctgtaaag 
ccagcacaga 
gacacccttc 
gagacgctca 
agtgttgaca 
gttatactgt 
gaccctccag 
atggtgaagc 
ctgtctcctc 
acaagaaggc 
agggcaccac 
gcatcccatt 
tcctgacctt 
gacacccagt 
ttccagggct 
aatacactga 
gtggccatca 
caccttgact 
agatgaaagt 
tttctgaggg 
ggcggaagtg 
cctagtccct 
tcaatctgtc 
cctaggaggt 
tcagatctaa 
atttaggcag 
acctctccgg 
ggcatttgtc 
cagaaacatt 
gaacccccga 
acagagcatg 
atgcctagga 
agacctgctt 
tccctgcccc 
tcttgtgaga 
ctgccctgta 
atggcacctc 
ctacagggct 
ctccatgcca 
atgtgactat 
accaccaact 
catcaagtgc 
cctgggatat 
tttgggtccc 
ctggccctaa 
gggaagtcaa 
ctgaaagatc 



ccacctgcag 
cactggtagg 
gtccagcctc 
acaggggtcc 
caaagtagca 
gggttgcagc 
atgactctga 
cttagggact 
gcccatatta 
agcagtgcac 
gctccatgca 
aaccaaaatg 
aggaaaagca 
aggcttctaa 
gacaaggagc 
ctttcatctc 
gcaggtcagt 
tcccaggcct 
gggctagctg 
agggcagcca 
atgtgtctta 
tgcctgacct 
cacaaggcca 
tctctcttcc 
agaggcaaac 
gacttcacct 
gcctggacca 
caaaataggt 
agaaaaaaac 

gggggcaggg 

ttgtgcacat 
caagttgttt 
tctagaagag 
agagacagac 
gactatactt 
caaggccagc 
acttgctgag 
gggtctgaaa 
gaagaatggg 
cttgtggtag 
cctgctgttc 
cctatgtctg 
gcccctgagg 
gataccagtg 
cccagcatct 
attcctagta 
caagtcatgt 
ttttatgatc 
agtacaagat 
cacacccacg 
gctcctgcct 
cagccacacc 
ctacgtgctt 
tatagcttca 
taagagaaag 
gtgacagtgg 
ataccaggcc 
tcactcacag 
ctggatgttg 
aagagttcaa 



taccccgtgc 
aagggaagga 
acgggtgctt 
cgtaccagat 
acaaaataat 
actggattag 
agtcaggtgt 
tctgactctc 
atctctcagg 
tccagaaatg 
cataagtgag 
gtcatggtgg 
tgagtttgag 
aatatgtcgg 
cagtcttgga 
aaaggttcct 
gagttcccgg 
ggctcctacc 
tggattgcct 
ggagagctgc 
gatggccatg 
ctagagatca 
catggcagcc 
ctcaggctac 
ccttcaatgt 
ccccaaagct 
aggcatctgg 
tcagatctgt 
ccttgtttct 
cagggcttgg 
gaaggacatt 
gatattttag 
ggaaaggaag 
agacagacag 
tctagacccc 
tatcttgaag 
cagcatggta 
gcgaggtcac 
aatagtgggg 
aaatgcggaa 
ttttagtaga 
ccgggcccag 
tcacccggct 
ggtgagtttc 
agcgtccggg 
aagttgaaag 
ttgttcttag 
ttcttggagt 
tgcttttttg 
gtcgcccacg 
gttgtactac 
ttagtctgag 
gctcgctccc 
ctaatggagt 
gctgactgct 
tgaggggatc 
aaatccaggc 
aagtcaggca 
tcttatacct 
agccagcctg 



ccctggagga 
agggatggca 
ctagactaga 
atctgcattg 
cttatggttg 
acagtcttgc 
agactagaat 
tgaaccacaa 
atgtcacagg 
aagttgccat 
agggtgggga 
gacatgccac 
ctagcctggg 
agaaaaagag 
gagacaggta 
tgcctgacag 
caggttcccc 
tgtccctgga 
ggcctctgtt 
ctcaccatcc 
gtttgaagcc 
gcagctgagg 
ttgacctggt 
catgcttttt 
gtcctataaa 
cctggtcacc 
tccatcttgg 
ctcccacagc 
gtcacctgtg 
gcaggaagtc 
tgaaagggtg 
gtctcagaag 
cctgaggctg 
acagacattg 
gagagccaag 
gtgtccctgt 
gccagaacaa 
ctcaagaggc 
cagactggga 
cagtgctgta 
aagtgtcatg 
cgaggccaag 
gagtctctcc 
tacttggagg 
catgctcagg 
ccaatgtctg 
tgcatttcgc 
atattttaca 
tgtagaaggt 
ttcctgaagt 
atcccaccac 
aaatggtagc 
cacatgcctc 
acacagtgct 
aagagccaga 
aggtaacaca 
ctgtggggac 

tggtggtata 

atagtgccag 
ggctacatag 



ggaggatgct 
agtgtgggga 
gtacctcttt 
aaattcataa 
cggttactgc 
tggcacagaa 
tcctgctccc 
tcttgccttt 
accagatgag 
cagtttcttc 
agtgcttctt 
aggctattcc 
ctccatagtg 
tttccaagag 
tgcagaaagc 
aagcccacat 
ggggaaagca 
gagaatatgc 
tcaaccttct 
tgagggagcc 
tgcttagatg 
tatagctgag 
tctgtagcta 
ctcagcttgc 
ggctctggag 
tgcaagaccc 
gtcttctcga 
cagagggaaa 
tagtgctatg 
ctgagtctaa 
ggcaagggtt 
ccctttgctt 
gagagactga 
ggtgggcttg 
tcatcttgct 
gcttctcaga 
caagaccaca 
tccctatgta 
cctggaactg 
cggacagcat 
tcaccacctg 
gaccttcctc 
gagacactgt 
gtcccctggg 
cctggactgc 
tctttctagt 
tattaagatc 
ttcataactt 
cactgccgtt 
gtttctcatg 
aggcagctgc 
cccaaacccc 
ccatggggct 
aagtgacatt 
gcccagggag 
gcctaactgg 
cactatctgg 
tgtctgccat 
cactgagaga 
gcagatccta 



45000 

45060 

45120 

45180 

45240 

45300 

45360 

45420 

45480 

45540 

45600 

45660 

45720 

45780 

45840 

45900 

45960 

46020 

46080 

46140 

46200 

46260 

46320 

46380 

46440 

46500 

46560 

46620 

46680 

46740 

46800 

46860 

46920 

46980 

47040 

47100 

47160 

47220 

47280 

47340 

47400 

47460 

47520 

47580 

47640 

47700 

47760 

47820 

47880 

47940 

48000 

48060 

48120 

48180 

48240 

48300 

48360 

48420 

48480 

48540 
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tcctcccacc 
atggaggcct 
tttggacttg 
ccctcggctt 
aggctgtctc 
actcaggagg 
gagttctaga 
aaacaaacaa 
tgttctcatc 
gctctgcagt 
agcacctgaa 
tgaagacggg 
agctccatgg 
gcccctaccc 
ggtctcctct 
ggaatgttcc 
gggaaagttg 
gctggtatgt 
tctgtaatgt 
aaaaaaaaag 
tatatatata 
gtataagaaa 

gggggaacat 

actgtggtga 
ggacagactc 
aggctgtagc 
tggcaggaaa 
gatggtaatg 
gaccaatggg 
• ttaatctetaa 
catgccttta 
cagcctggtc 
ccccataact 
atgaaagatt 
caaatgtggg 
tttggtacgt 
ggttaatatt 
ggctcttaaa 
cgaccaattg 
aggatgctct 
tttgttgttg 
ggccctttct 
ttctgcctgg 
ccacagctcc 
agcagcctct 
ctctctttgc 
ctgcacacat 
caaactgttt 
aaagtgcttt 
ggcaagcgct 
caacatacag 
aataactcag 
cacacacagg 
ttacacatag 
taaataacgg 
attttattgt 
atctgtatgt 
gattctctgg 
agttgaactt 
caccttttcc 



ccacccccaa 
gagtgactcc 
gcatgagggt 
acagctgtgt 
tttaaaacac 
cagaggcagg 
acagccaggg 
aaagaatgac 
ctaagaggta 
cccacctgag 
agccatccag 
ctccagcaac 
gtaacggaga 
acccaggagg 
ctgtccagat 
aggcttgttg 
atcatttgtt 
atgcatgctg 
tttcttgttt 
gtggaagaca 
tatatatata 
gcgggggaca 
tgtgtcaggg 
tttggttggg 
acacaggagc 
cactcttcct 
taaattaaat 
tttaaagcaa 
taacacttag 
aataaactac 
attccagaac 
tatataatga 
atacttttat 
acactgggaa 
aaacatttca 
gagcacttag 
cagggatgag 
ggcacaggca 
tcagattggc 
tgtacaccgg 
ttgttctgtt 
ttaaaagtat 
acacaggtcc 
ccccaacaac 
gatgcgcctg 
tgaataatga 
ctcttcactt 
tccgttcccc 
gattaaacag 
gactgagggt 
aaaactgtta 
cagttaagaa 
gacagttcca 
tgcacacaca 
aatcatttat 
ttctaagttt 
gcgtgcactg 
agctggagtg 
cgttctctgc 
tcatttttaa 



ccctagaaaa 
aacccctcta 
cagacctatt 
gaactaagga 
aaaaagtcag 
cagatttgtg 
ctacacagag 
catttcccag 
ctgtactctg 
gggactttca 
gattatctga 
ctccctcacc 
gccctgagag 
atggccacag 
gtctctgcag 
gaagctcttg 
tattcctttt 
gggatgtagg 
gtttttaact 
ctgccatctc 
tatatatata 
ctcatgtgcc 
gaacaagcat 
ggagacatgt 
cttcttacct 
ccttgctaca 
ttgctttgtg 
gccaacatca 
gatttttaaa 
aaacataaac 
ttgggaggca 
gttccaggac 
tgtactggtt 
gaaactgaaa 
catgggtgag 
agcagtttcc 
tcttccatca 
gtgcattggg 
ctgctagtca 
ctagaagcag 
gttgcaaatc 
acatgtatca 
tgacctgtgc 
gagttcccca 
actctctggc 
acctcaggtt 
tcttttatag 
agccatctct 
gacaggcgcg 
ctcctctgta 
tttagtgact 
catttgctga 
gtcccggtgt 
tacactcaca 
ataggttttc 
taatttattt 
tgtgcatgca 
attgataggc 
aagaacagcc 
aatttccttc 



atataaaatt 
tactgtagtg 
tgttatctct 
gctagagaac 
atggaggctg 
taagtttgag 
aaaccctgtc 
cccaattacc 
tggcctctac 
ttatttcttt 
gcactcagct 
tgaagaagct 
aggggtgggg 
caagagaaag 
cactcacagt 
ctctcataga 
agggtattgg 
gcctcatgtg 
caaattaatt 
atttgtgttg 
atttatttat 
atggtgtgta 
gtgctgaggg 
atgtgctatg 
gctgagctgt 
gtcccacatc 
tgtgggtgct 
gttctccagt 
aattattttg 
acactgtggt 
gaggctggca 
agagagatcc 
ggtctacagt 
gccatccaga 
ggcttgcagc 
agttctctgg 
ccggtaatct 
taaatgtggc 
tttgtttcaa 
ggctgtcatt 
atttatcact 
ttttgtgaca 
tgtgtccttg 
gtacctgcct 
acattatggc 
gttcaagaga 
atcaggtagg 
gcaggtgcac 
ttcttgagtt 
ccctgttctg 
gattaaacta 
tcttgcagag 
gtccttttct 
tatataaaac 
atttacatag 
ttctctgtgt 
tgaacccaca 

tgttgggagc 

agctcttaac 
cttccttttt 



aatactacat 
ctccaggtca 
aggctagctt 
tggccctcca 
tatgcacctt 
gccagcttgg 
tccaaaacaa 
cctccaacag 
ctgactttga 
gtttctcagg 
cctcctggat 
gatgtcactg 
gagcttcatg 
tgctcattag 
aattggccca 
atctgagctc 
gggggcacgg 
tgctaaatac 
agaggcagtt 
ggacttgaga 
ttatttattt 
tggggacaca 
tggggggata 
atacctgtgt 
ctcagcagtc 
cagccaacac 
gacggctcaa 
gccgaagtta 
attgtacatt 
tggagctggg 
tatctctatg 
tgtctcacaa 
gtaaagaatt 
gtaatgaagc 
cagcctgtgc 
tgctctacca 
aatttgccat 
aaataatttc 
tgagaactgt 
tttataggtc 
gagggaaaat 
gctcataaga 
ctaagctttg 
cacctcatca 
agtgttaaaa 
ccggaatgtt 
gactgggcgt 
tccacataaa 
catctgttca 
agaactaaca 
acgaaggcat 
gacctgggtt 
cacttctgtg 
agaacattta 
gtaaataggc 
gtacatacgc 
gagaccagaa 
cactccacat 
tgatggcttt 
gagacagggt 



tgacttccca 
cctttgagag 
gttcttccag 
cgtgtttctg 
tgatcccagc 
tctacagggt 
acaaataaac 
cacccaaaag 
tcaattcctt 
cttcccgagg 
tccgactttt 
ctctgcaagg 
ggtaatggga 
agtgaccctg 
ggtggagtct 
taactgagct 
atgtatgcat 
atgtgcctat 
tctctgttta 
tatatatata 
gtgtatgtgt 
cgtgtcctgt 
tatatgtgcc 
gtaggacaga 
caagccctcc 
ataaaggctc 
gtcttccggt 
ttgaatgact 
ttgaattaca 
catagtggca 
actttgaggc 
a.cagagaaaa 
ggcaaagaat 
aaacccctca 
taatttatgt 
atcttagctt 
tctatttaaa 
tttgacaaat 
tttttctcaa 
tctgtggtat 
acacacaaag 
agctgttttt 
tcagaccctt 
ctatggtgac 
gcttccatct 
cttcacctgc 
gtagatggaa 
tcaagtgtta 
catactgtct 
aaagacgaat 
gggctggaga 
gggttcctag 
gacacaagtt 
aaagtatgtt 
aaaaatctgc 
atgcctcctt 
gagtaccaca 
ggggattcag 
tacctccagc 
ctcaatactt 



48600 

48660 

48720 

48780 

48840 

48900 

48960 

49020 

49080 

49140 

49200 

49260 

49320 

49380 

49440 

49500 

49560 

49620 

49680 

49740 

49800 

49860 

49920 

49980 

50040 

50100 

50160 

50220 

50280 

50340 

50400 

50460 

50520 

50580. 

50640 

50700 

50760 

50820 

50880 

50940 

51000 

51060 

51120 

51180 

51240 

51300 

51360 

51420 

51480 

51540 

51600 

51660 

51720 

51780 

51840 

51900 

51960 

52020 

52080 

52140 
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agctcatccc 
aagcatgcgt 
gacaaaaggg 
accctgtctc 
gcctctaccc 
cacacacaca 
ttcctctttt 
tcagacaccc 
ttgctgggaa 
agccccacaa 
atttctaacc 
ataagtcact 
agagcgcagt 
cacatggaaa 
aatacaggag 
ctggcgagat 
aatcccagca 
ggtgtgtctg 
aaaaaattaa 
acattattat 
acacaaatcc 
taatcctagc 
cccctccaga 

c ggg fc gggga 

taaagaccaa 
atgctctatt 
ccctctcctt 
ttgcattgat 
. aaatgcaggc 
ctagcctggg 
caagatgttg 
ttggaataaa 
actataggaa 
aaacttctag 
gttaaatatg 
atatatattc 
aaaggctttc 
ctatcaatag 
ccacccagca 
agggtttctc 
ctctgcctct 
ctctctctct 
gcccctctct 
ctcctgagtg 
aaaagaattt 
actcctccca 
tacatatgtg 
acatttagtg 
ggctcagtgg 
taaaggctgg 
acagggaatc 
actggccagc 
aaggcaccta 
acacacacac 
aataaataaa 
gtgtgtgtcc 
ttgcgtttcc 
agatacagat 
ccaactgatc 
tgtggcctat 



aacttgaccc 
cactatgccc 
aggatcgatg 
aagggaataa 
aggcacatgt 
gagagagaga 
tttttaaaaa 
caaaagaggg 
ttgaactcag 
agaaactttt 
atactataca 
ttataatgct 
gcctagatta 
tgacgaaatt 
aaataaagca 
ggctcaacag 
accacatggt 
aagacagcta 
agaagttaac 
ggacattaaa 
ttgagtttca 
acttgtgaaa 
agggagaggc 
cagactgttg 
tcaatttaaa 
ctgcccttag 
caggtctggg 
cacagctcca 
aaaagaatcc 
ctccatgaga 
cagcagcaca 
gcctcaatca 
caactgtgtt 
acattgattt 
taaaaacaag 
atatatgtat 
tgacaataac. 
gcttgacact 
ccccaaattt 
tgtgtggtcc 
ctctctctct 
gcccctcgct 
cttcccctct 
ctgggattta 
cctatgtgac 
aatcccctct 
tatatatgtg 
ttattcattg 
gcaaaattct 
ctggacatgg 
cctggagact 
tctgggttca 
atgtcaacct 
acacacacac 
atatttagct 
tacaaacact 
agattgaaaa 
tgaaaaggat 
tttgccgggg 
ctgctgttag 



cactcttctc 
ggctttaaat 
gggctgactg 
ggcacagagg 
gcatacacac 
gagagagaga 
tattatttat 
catcagatcc 
gacctctgga 
aatgagcaaa 
aggaattatt 
atctaatcca 
aataaataaa 
cctaacaaaa 
cagaaagata 
gtaagagcac 
ggctcacaac 
cagtgtactt 
atagaagccc 
aagagaaaat 
taccaaatgc 
aggtcagctc 
taattaacat 
cccatttcca 
gggtgcactg 
aaaccgtata 
acaatcccac 
cttcgtggta 
gaaactcaag 
ccctggggga 
ggcagcctgg 
tgactctccc 
cagaacacta 
gggaagatct 
gacagttttg 
atgaatgaaa 
agaaagagaa 
ctttagctgc 
ggattattgt 
tggctgtcct 
gcctctctct 
gcctctctct 
ctgcctctct 
aaggcatcag 
tactgtattt 
acccactcaa 
tgtgtgtata 
ttgcatgttt 
agctgcacaa 
tggcttgcct 
tagaatctca 
tcaagaaacc 
caaaccccta 
acacacacac 
ctccagacca 
gaaggttaag 
cagattctat 
acagattgaa 
cttgtccttc 
gacctgaatt 



ttgccttagt 
aaactcaccc 
gccacaagcc 
atagagccat 
accacacaca 
gagagagaga 
ttcatgtata 
cattacaaat 
aaagcagtca 
taattgcttc 
aaagaacgga 
tctagaacaa 
atgcagacca 
agctcaagat 
ctcaaaggca 
ccgactgctc 
catccgtaat 
aacatataat 
actcaggacc 
tcagcagtag 
ctttagacca 
aggaactttg 
ttctcagacc 
gactagggaa 
ttccgccaat 
aaaactagcg 



tacactggaa 
agctaagact 
gtcatctaaa 
ggggcagagg 
ccaccagtgc 
agttttataa 
tttataatag 
cttggcagct 
ttttttgttt 
acccaaactt 
atagagaatc 
caggagagct 
tttattttat 
ggaactcgga 
gcctctctct 
gcctctctct 
ctctgcccct 
ccatcacttc 
aaatcaccac 
attcttatct 
tatatatata 
tcaatgtgct 
gcctaaggac 
atgatactag 
gaagtgatct 
ctacctccat 
cctgcatgtg 
cacacacaca 
aatcttggtg 
aagcatgctc 
aggctacaca 
aagggtcggg 
agggaagggt 
gcctggagtg 



caccacaatg 
ataatcccag 
gtgcttcaag 
acgcctgacc 
cacacacaca 
gagagagaga 
tgagtacact 
ggttgtgagc 
gtgctcttaa 
caagtaaata 
taataggaga 
aaacactgta 
ataagtaaac 
gggcagttta 
tagaagttaa 
ttccaaaggt 
gagatttgac 
aaataaataa 
ccactcagtc 
tgtgcatgca 
cttgtggctc 
ggaaggtcat 
acagggcggg 
gtccttgtca 
catattgtgc 
aaggggtacc 
caataaattc 
ccctggagtc 
aetacatagc ' 
gagaccgttc 
tgtcaccaga 
ttggaaataa 
caaagatctc 
tattttgaaa 
tgttttgttt 
aaaattcccc 
cataaaaact 
gaatctgaac 
ctttccccta 
gatcctctgc 
ctctctctct 
gcccctctct 
ctctgcctct 
cagcttcctt 
acggccaata 
tgtattcttt 
ctatatactg 
ttccaggagg 
cagggttcag 
catgcttgct 
gggctggaca 
aacataaagt 
cacacacata 
cacacacaaa 
aaacccatgc 
cttagtaatt 
gtgctaaatg 
gtctgggcca 
tacaggattc 
tttctagttc 



tttagtttat 
cactgaagta 
ttcaatgaag 
tcctcctctg 
cacacacaca 
gagaaacttt 
gttgctgtct 
caccatgtgg 
ccatctcttc 
ctactaatat 
ataaaaaatt 
ataatgcaaa 
tttatagcag 
tttaaagtga 
catagggggg 
cctgagttca 
tccctcttct 
aactttaaaa 
ctagagtatg 
ctgcatatat 
tgcaaacctg 
gaaactcttg 
aaccgacctg 
cctcattccc 
ctagttgctg 

aggggtaacc 

ctcttgcttt 
ttacattggc 
• aagcatgctg 
agaagacagt 
catgttaatg 
gaaaggaaag 
agagtaaccc 
actttacaat 
tgttttaggg 
actatgcttt 
agttctgaaa 
acagggaacc 
cccccaagac 
ctctctgcct 
ctgcctctct 
gcccctctct 
ctgcctctgc 
tatcatttta 
ctccccccca 
atcattatta 
ctaatgagta 
ctggggggat 
atccccaata 
ggaagcaaag 
gactagctga 
gtgatggaga 
catccacacc 
taaataagta 
atttgcattt 
ttatagcagt 
gattatgctc 
ggatgacggg 
accactgggg 
ccactagttg 



52200 

52260 

52320 

52380 

52440 

52500 

52560 

52620 

52680 

52740 

52800 

52860 

52920 

52980 

53040 

53100 

53160 

53220 

53280 

53340 

53400 

53460 

53520 

53580 

53640 

53700 

53760 

53820 

53880 

53940 

54000 

54060 

54120 

54180 

54240 

54300 

54360 

54420 

54480 

54540 

54600 

54660 

54720 

54780 

54840 

54900 

54960 

55020 

55080 

55140 

55200 

55260 

55320 

55380 

55440 

55500 

55560 

55620 

55680 

55740 
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ttgaacttta 
gtctctgcag 
aaggggtttg 
gcaccgcctc 
ctccatgccc 
ccatcagaca 
tcctcagaaa 
tgcaaaatgg 
aatgaagcat 
agggaaggga 
gtgtggtggc 
gtttgaggcc 
tctatacaga 
aaagaaagaa 
tctaggggtc 
cgttgatttg 
aggattatag 
ctgagtgtgc 
tacttgcatg 
attgtattta 
catgtaacag 
tgagttccag 
tcctctactt 
ttacattggc 
cttccactcc 
cagaagctcc 
ccccccagct 
cagacattag 
taccaggaag 
ctggtaattt 
agagtggaga 
tctgaaggga 
aggaatgaaa 
aaaaaaagaa 
gactacatgg 
acaaggacag 
gcccaggatt 
gctgtgtgga 
gtcaacccac 
gcagactgac 
aaaatccctg 
agagcctagc 
ttcctggaag 
agtacctgct 
aagttctagg 
agagggaaac 
cacatgagct 
tctccacetc 
ttttcagagc 
agaatattta 
tgtgttctat 
ctttgaagtg 
acggatgctg 
ttgaagtagg 
gtccaagggg 
tctcattcct 
ctctctctct 
tgcttcctca 
gagagaaagg 
tccacagaac 



ccttgaacct 
aggttgtttg 
gatttggaaa 
ctcacgttgg 
ttttgtgaag 
aattgcctgg 
tggaaaggca 
tggtgcccct 
agatctagct 
cctagatctt 
gcacaccttt 
agcctggtct 
gaaacctgtc 
agaaagaaag 
agagagcaga 
gggatgttgg 
gcttgagcca 
tcctgtgttt 
tgaaggccag 
tttgtttgtt 
cacaggtatg 
gaccacactc 
tgtcttctga 
tggccagctg 
cagtgctgga 
aaactcaggt 
cctttaccct 
gtaggaaata 
tgatgtccaa 
tgatgttggg 
caggttttct 
aatggaggta 
ggaagagaga 
aagctacctg 
gcaccgagag 
gaaatctgct 
ctgtctgtgg 
gacagcatgt 
caccatgtag 
agtgggttat 
tgatgacaac 
atattggagt 
gataggttgt 
acaatcctgc 
ccagtatgga 
caaataggtg 
tgcttagtgc 
ctttgactga 
tgtttctatc 
gaggcggcct 
ttatacaact 
aaggatgggt 
ccagacacca 
tgaggagaga 
cagtatgacg 
aacctcccat 
ctctctctct 
ttgctggaaa 
cagacagaca 
tccgaagagg 



ctgctcccag 
accaacagct 
gatgcaattg 
ctagtctaat 
gcatttcctg 
tgttggagga 
aggaaaacat 
cctccacagc 
attttttttt 
tatgttatgg 
aatcccagca 
acagagtgag 
tcgaaaaacc 
gaaggaagga 
atctccaaaa 
cctccagctc 
acacatctgg 
ggtacttata 
aggccaatgt 
tgtttgtttg 
aaggtagact 
cagcccccag 
gatagcatct 
atgcatttta 
gttcgggaca 
tcccatgttc 
ggtctctgaa 
gaccatatac 
ctcctctttt 
aggaacaggg 
gaagggcagg 
gagtcgacct 
cggcagttgg 
cacccttcaa 
gcatcagtga 
gcatggccta 
tgcagacctg 
tgcagagcca 
ccagtgggct 
gtacacaagt 
ttctaaacca 
cgagtggcca 
cttcctagca 
ttcttcagaa 
caacatagca 
acgtgccaca 
cagaaaagtc 
ctaatggggc 
taaggttgct 
attttggtct 
aaaaatatgc 
taatttctaa 
gcgtttaagt 
gaccaggtct 
cagaacagag 
aaagcagaga 
ctctctctct 
taaaagcatg 
gacagacaga 
gtgttggacc 



ggaagtcatc 
ctccccaggc 
ctataggagg 
ataaacatcg 
gcatcagctc 
ggaggtgagc 
gaggttcttc 
tgctcacggc 
tagtgccttc 
cattgttaaa 
cttgggaggc 
ttccaggaca 
aagaaaaaaa 
aggaaggaag 
acaccaacaa 
accatttcct 
cttacgccta 
tatgaatata 
cagctgctct 
tttcatcgta 
tgcaggagtc 
gcctgggctg 
agactcacgg 
aggtcaaatc 
cctgccacca 
gcatggcagg 
tgggggggag 
atgaggaaag 
gcttatcagg 
acattcatag 
agatctgtgt 
gggagagagg 
gatgtattgt 
gtgttcctct 
gggtaggtac 
agatggcaaa 
ctgtagaatg 
tgtgaggatg 
gggggagctt 
gggcgtgtca 
ccctgaggca 
tgcagctctt 
gcctcgtcaa 
gactgaggca 
agagtctgat 
ctagtgtctt 
agaggaggag 
tggatataat 
ctgaatagcc 
cccacaaaga 
atcagcccgg 
gaaagtaaaa 
ggcttgaatg 
ccagagttgg 
tggcaacctc 
atattgcact 
ctctctaact 
tgccaccaca 
cagacagaca 
cccagaaacc 



aggactctgc 
cttcgcccac 
gactctgaag 
cggtggatgg 
ctgacttcag 
agggccattc 
agacacttaa 
ggggcaggag 
agtaaattta 
agtgagaact 
agaggcaggc 
gccagggcta 
agaaagaaaa 
gaagaaagag 
tgcctgctgt 
gccttagcct 
ttgtgtgtgg 
tgtatatacg 
catcttatcc 
cgcatgcagc 
agttctctcc 
taagagagcc 
aacctggagc 
ttcattccat 
agcccagttt 
cacattttca 
getataaatc 
atattcacct 
agaaatgctg 
gaccccattc 
agaaaagatg 
ggaggtgggg 
ataagagaag 
gtgtgggagg 
ttgatgttgt 
atgtggcaca 
agctcccagc 
agggtccagg 
gggcccacca 
cacaaccgtg 
aaaggagtag 
ggaagcgtga 
tagatgtcaa 
gggggattac 
ttaaaaaaaa 
cctgctccaa 
agggcagaca 
ctgttttaca 
atctcgaaat 
tttcacaggg 
ggaaactggc 
gcaaatgtag 
gaagagcaca 
gcctgcagtg 
taccagtagt 
ctctttctct 
cacagagatc 
tgcagctctc 
gagtataatg 
ggagttctag 



catccctgga 
gacctcaggt 
gcagacagac 
tgaggataga 
acagtttcac 
ccatcatttc 
tccctgggac 
atgagggcca 
aaatcaaata 
tgtagccagg 
ggatttctga 
cacaggtttc 
agaaagaaag 
gacaacatgg 
aaatgtatgt 
ccaaagtgct 
aaggggagtg 
catgtacgca 
tttttattac 
cactcatgag 
ttctgtcact 
atcttactgg 
tcatctagat 
ccctacccca 
ttcctggatg 
gttaagcctt 
aggctgctct 
gccccatggg 
actactacct 
ctcgctggtg 
gatgctgttt 
gatggttggg 
aatcagaaag 
ctgtctcagg 
gtccctgaaa 
atcaagtaag 
attcccactt 
ccaggaggat 
aggagcttga 
caacacagag 
acaggggatt 
ggaaggaaat 
tgtatgaggt 
ttgaacccag 
aaaaaagtaa 
gggtcctggg 
gagaccctcg 
aaaggacagc 
atgccagaga 
aaaatgtatt 
tctttgctgc 
tgcaggcacc 
ccccaagtat 
gccagggtaa 
agaattcagg 
ctctctctct 
ctcctgcctc 
cttgtttgga 
tgtgcagatg 
gcagttgtaa 



55800 
55860 
55920 
55980 
56040 
56100 
56160 
56220 
56280 
56340 
56400 
56460 
56520 
56580 
56640 
56700 
56760 
56820 
56880 
56940 
57000 
57060 
57120 
57180 
57240 
57300 
57360 
■ 57420 
57480 
57540 
57600 
57660 
57720 
57780 
57840 
57900 
57960 
58020 
58080 
58140 
58200 
58260 
58320 
58380 
58440 
58500 
58560 
58620 
58680 
58740 
58800 
58860 
58920 
58980 
59040 
59100 
59160 
59220 
59280 
59340 
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gctgtccagt 
aaccactaag 
cacagtccag 
gagttcagag 
tacacgagga 
ctcttgccgc 
caaaagaaac 
ggcctgcaaa 
gcccatgcat 
caagtttgtt 
agcaggaaag 
tctttcttcg 
cttgcataaa 
cagtctggta 
gctgtaatgg 
aacactgaac 
tttacacaca 
gggcttgcct 
gcagagcgcc 
gaggaagacc 
tccagagaca 
gggttttaag 
ggcaatgtag 
actgtgtagg 
atctattttt 
tcatgtcctt 
accagcaagg 
cattttcatt 
ccaattgaca 
• cctcctgatt 
tcctatgact 
agaatctacc 
agcagtattt 
gaggaactca 
acactaaaat 
agagctcccc 
tcaaactgag 
aacaattatt 
tttttttttt 
agaccaggct 
• aaaggcgtgc 
aaaacaattc 
ttctttctct 
tttttttttt 
ggaactcact 
caagtgctgg 
gtttatttat 
gccagatctc 
acctctggaa 
■ tttttcttaa 
gagacacccc 
cagatttcca 
tctgtctgtc 
tgagtgagtg 
tctcctaccg 
cttttcccac 
tttgtttgtt 
acctaaacct 
atcgtcttcc 
tgtgaatcta 



gggtgttgcg 
ctatctgtta 
ggggctgacc 
tgacatctcc 
gaaaaggaac 
tgtagagccc 
acaacagtca 
gccctttcaa 
aagcaagggc 
catcttacat 
catgtccacg 
ttgcttggga 
tctataaata 
tagcaattta 
ccaccaaaca 
acagtatggt 
actcacaaat 
gagctgcagt 
cttcctagat 
agaagtgagg 
ggctttggag 
ctgtggggat 
gagggcggga 
tggggtgaga 
ttggccatga 
gtctacagac 
aaggtatcca 
acccaggtgc 
agtctgctgt 
tctctgcagt 
cgctgacatt 
aacaggcgtt 
ggccgtcggt 
atggcagaga 
cgctgcctaa 
attggtccaa 
ggccacacgt 
tttgttttgg 
tgagacaggg 
ggcctcagac 
gccaccacca 
ttatgatttc 
ttctttttct 
ttttttttgt 
ttgtagacca 
gattaaaggc 
tattatacat 
attacagatg 
gagcagtcag 



tggaccccca 
actgcccacc 
tttcttctta 
tgtctgtctg 
agagagagag 
tgtgggtccc 
aaaaccatct 
tttatttttg 
tccaccacac 
taattattac 
ttgctaagat 



aactgaactc 
gtcccccaag 
tgctacaagg 
agagttcttc 
agatatgtcc 
ctttccttgg 
aaatagttgc 
cagtgtagcc 
tgggataaag 
ccttaggtaa 
gtaaggtaga 
ttgtacttgg 
tatggttgga 
ttagctattt 
ggtcctttaa 
ggtttcaact 
gatacagacc 
agtccgggag 
gaaatccaag 
aggaaaaggc 
cttgcctgtg 
gggtatattt 
aggtgatgga 
agcactgtgg 
ggaatttggg 

ggggtgataa 

gtgcccaccc 
ccggagaggc 
cttccattga 
cccctatttg 
cctgctgtcc 
cccttatccc 
tcctgtgaag 
atagtagtca 
atgcaccagt 
gatggtatga 
tgcctataaa 
gggtttggtt 
tttctctgtg 
tcagaaatcc 
cccagctttt 
cttaaaaatg 
ttctttcttc 
ttttcgagac 
ggctggcctc 
gtgcgccacc 
aagtacactg 
gttgtgagcc 
tggctcttac 
tgattcttgt 
ccaaagccct 
ccaccttggt 
tctgtgtgtc 
agagaatatg 
aagagattga 
tgcctgactt 
ttttcatgag 
ccagcttagc 
cagtcgatgt 
tctttccgga 



aagtcctctg 
aatgtcttat 
ggtagaggaa 
cttcccttgc 
agtatgcctg 
ggagtgcaga 
tggccaaggg 
aaggccatgt 
ggctactgtt 

gggtgggtag 

tactgtagag 
gaccttcact 
gtcaggtcca 
gtctttagac 
gtatctttat 
gtcatggagt 
acagcagggt 
cagctcttga 
cctctggtta 
tcagagtgga 
gttcagccta 
tggaaggcga 
tgatcgagga 
gtgaggtgga 
gtgcacacac 
tggtactgga 
cacagctcac 
cagtcccatc 
agataaggta 
cactgagtgt 
acaggtcaag 
tccggtcacc 
agagtgcttt 
ggccagccac 
gaagtacccc 
acctccttaa 
atggtgtctg 
tttttgtttt 
tagccctggc 
acctgtctct 
tagttttttg 
actggcttaa 
ctttctttct 
agggtttctc 
gaactcagaa 
acgcccagct 
tagttgactt 
accatgtggt 
ccactgagcc 
acttccagct 
cttcatccct 
tttttaaatt 
tgtctgtcta 
aaggaaaagg 
ctccagattg 
ctctactgct 
ataggacccc 
ttttcatttt 
tcaattagcg 
agtagatgac 



gaaaactgga 
cttgataggc 
agaaaactgg 
cgtgtagtaa 
gcatcttgaa 
gaagtgctgc 
agggcatgtg 
ttgacagtac 
caagttagtt 
cttgttagct. 
tttagctttg 
catgcagccc 
gcatatgtgg 
aagtaactat 
tggccaaaca 
ccacattcta 
gatttgacag 
tgggatctga 
ctcctcagca 
gcatctaagc 
ctgtgggaag 
acaagaacac 
cggccccaga 
gttagaggga 
acacatacat 
aggtagaagg 
cctcagacag 
accatggttg 
ctcacaggtg 
ccaagtccca 
tccttgctgc 
tttgaggtga 
tgatttaggc 
ctcgctgggg 
tcatgggagg 
aataggaagg 
gtacaggaac 
gttttgtttt 
tgtcctggaa 
gtctcccaag 
tttcttgttt 
gtggagacct 
ttcttccttc 
tgtatagctc 
atccgcctgc 
ccttctttct 
cagatgaacc 
tgctgggatt 
atctcgccag 
caacttggtc 
gagaggagtc 
tcacatttat 
tcgtgtgtgt 
acagcttgca 
ccatgcttgg 
cctggcttta 
caaatagcct 
ctaaacatgt 
gttgtaacaa 
acatcagatt 



aagtactctt 
cttcagatct 
ctccaggcct 
tttcctaccc 
agggcactga 
tagagaggtt 
ccctgtatcg 
aagcctgtaa 
.atatacacat 
ccacctctcc 
ttttgacctt 
atgtttcagc 
gtaccagtcc 
agggttctga 
ccagggttaa 
gcaggggaat 
agacaacgaa 
gatctagatg 
agagtagctg 
ctgcagcacc 
agaccagtga 
cagcagctca 
gattttaagg 
agaagtgaag 
acacacccca 
tggaaagagg 
accttgttct 
ccaaactcag 
tcctgtagca 
ggttctgcat 
acgagggctc 
gtgattctgg 
tgtgaggagc 
cagtgggtaa 
ctgcaacgaa 
agaagtgaca 

agtgggagaa 

gtttttttgt 
ctcactttgt 
tgctgggatt 
tttgggggag 
agaaaggtgg 
cttccttcct 
tggctgtcct 
ctctgcctcc 
ttttttaaag 
agaagagggc 
tgaactcagg 
cccaggtgta 
ctgcctttat 
tcacaaaaaa 
ctatcatctg 
gctgagtgag 
agagtggatc 
tggcagattc 

ggggggcttt 

ccttgagtct 
ctgcgtagga 
ggccccagag 
aataagtgtt 



59400 

59460 

59520 

59580 

59640 

59700 

59760 

59820 

59880 

59940 

60000 

60060 

60120 

60180 

60240 

60300 

60360 

60420 

60480 

60540 

60600 

60660 

60720 

60780 

60840 

60900 

60960 

61020 

61080 

61140 

61200 

61260 

61320 

61380 

61440 

61500 

61560 

61620 

61680 

61740 

61800 

61860 

61920 

61980 

62040 

62100 

62160 

62220 

62280 

62340 

62400 

62460 

62520 

62580 

62640 

62700 

62760 

62820 

62880 

62940 
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gtcaaactca tcagtgaatg aaccatttcc caacagttca tctgtttcct ccaccaaatc 63 000 
tgttggtgga gcagctgggt tacacatggg cgagctgggc tcagccgcgg tgccagccat 63060 
ttttccagca tgctatggaa tataggggtg ttatcaggcc agggtgggaa aagcctggga 6312 0 
atagcgagcc agatagcctc acccgggaca ttgaccatcc cagccatgta gataaatcag 63180 
agttgacatc accaaagcca gccactgact gcatgtctta tgctctgaag gtgaagtcag 63240 
agtccctggg cattcctcag aaaatgcatc tcaaagtgga cgttgagtct gggaaactga 633 00 
tcgttaagaa gtccaaggat ggttctgagg acaagttcta cagccacaaa aaaagtaagc 63360 
cacccacccc aaccctgcaa cacacacccc atgttctggt tcctgtgtga gagctcttat 6342 0 
gaatgaccag tgactagatg tcaacagcga ctgtcttagt tacttttcta ttgctatgac 63480 

aaaacaccat gaccaagacc acttacaaaa taaagtgttt gatgtgggac tcacagctcc 63540 

agggggttag agcccatgac catcacaatg gggagcatgg taccaggcag gggagtgtgg 63600 

cgttggatca gtagctgaga aactacgtct tggttcacaa gcataaagca gagagggcaa 63660 

actagaaatg ccgtcagctt ctgaaatcac aaagccccgc ccccagtgcc acatctcccc 63720 

caacaaggcc acacctccca atccttccta aacagttcca gcagttggaa accaaacatt 63780 

caaatatatg tgtctatggg agccattctc attcaaacca cctcagtgat tgtcagcaaa 63 840 

agccagtccc accccctttg cagggtagtc cctgtccatg ggtgctcccc aagccgctcc 63 900 

ctgcctatgg aagtcttttc ctgatacaag ggcactcctt aggggcctcc cctgcatact 63960 

tagccgccag ccctgaaaat gagattataa atttgtcaca actgccccat agcagtcctg 64020 

tgtacacacg gtgatttcat gtgtctgtgt gactgtcctc ttccctgcaa gcatgcagtt 64080 

acctcactgt gccgggacag gccctgtact cacggtattg ttggcagagc tgctgggtta 6414 0 

ccacccatag caaatcacta accctgtgga gctgaagtat atgaagcctg cgtctaaatg 64200 

gtccaaactc tcactgccgc tctgtcactt ggcagctggg ggaccttgac ccgtctctct 642 60 

gaacctcagt ttccagttct cagggatgag aaagattgag tgaacaacat gagggacagc 6432 0 

ccaacaccca gcttcctttc caccagctct cctcatggac accgttgagc cactgcagct 64380 

cttctgtctg ttgtactgtc cagtatttgt acacacagca ggaaagttcc ccatgacagc 6444 0 

tggcggagaa gctgctgtgg gcagagcagc atccagagaa gcaagccacc catatggtcc 645 00 

ctgcccacat gcaatttgta ttctggtgca ttgagggtga ataattctca gctaaatcta 64560 

ccaatacagt ggcttcagct actcggaagc tttgaagaaa tcaggacagc atgataggta 64620 

aagatccttt gggaagagag aatggacaga ggaggctagc ccttttaagc aatgagcttg 64680 

tgagctgaga tctcaaaaaa gctactggtt ggacaggtgc ctgagggaag agtctcccag 6474 0 

cgagtgcata aggtggagtg acaggatctt aaagaacaga aatagatgtg gggacagtgt 64800 

agaatggaat aggagggcaa tgggaatgtt gaatgaccta acctcctgaa gaggactttg 64860 

gaaaaactcc caagcaagaa gttttaggct agaccttggc gctgttgaat tatggatgca 64920 

ggtggttctg agggtagccc tgggtgaatg ctcatctcca tatggggttg gacaagccca 64980 

ccttcaaatt caggttctac ctcctgccag ctgtgtaacc tagcctctct gcgcctccag 65040 

tctccatctg taaaatgggg atgaagacaa cagatgtcag gtggagattg catgcttatg 65100 

agagcaactt gcacatctcc tcactcctct aagttgagtg agagaatgcc caggcagggg 65160 

ctagaaaggg gacctgttga tcctacacct cagcactggg gaagaaaagg gctctttttt 65220 

tccctcttgc attttgagtt cgtggcagaa gccaggacag attaccaaaa gaaaggagaa 65280 

tggattgatc acatgtaagt ctcgtgacgt ggaagccctt acagggaaag caggaccagg 65340 

aggatggcta agcagagtgc tgtagccaga tttccaaaag gtgggaagct agaaagtgat 654 00 

agcagagagg ggctgggggc taagtttagg gcaatgggaa gagcttggcc aggcccgtaa 65460 

gctcagactc ctctcagtct tgtttgtcct tggcaataaa tgccttccct tcgggtacag 65520 

agagcatact tgtcacatga gggttttatc tccagcttag aggaagacca gaaaatcctt 65580 

tctccctcct catccttctt gtcagggtgc tatattttgg ggtactgggt cctcagctcc 65640 

atcagtaaat agcacaggag gtattagaag tcaagaactg agtttcccag ctcagtcact 65700 

gtgacttggt ttggtgccca ttctggfccct tggttgcatg ttgctacaag gcttaaaaga 65760 

atgaatatca aaatgtgtga tgcctggcgt ggagagagtg tttggtgcaa ttggccactg 6582 0 

gcggaggagt gtcattggta caattatcct tccatgtttc tgtgaggaca gaatggaatg 65880 

ggggtggaag gtaaagagag gaagtaggaa agacagcctc tggactctgg agtagcaagc 65940 

tgtagggaaa ttacagagct gagcttacat catcaggaca gggctctctc tgccttactg 66000 

tgtctagttg gggagtgatt gatggtttga gcgggaaggc tatgccagaa gccgttggct 66060 

tagaatagtg catactgctt gtacagggca aataagtcca gagattcatt gcagcagggc 66120 

cactggtgac agctacttag agcaaactga actcccgaga agctctgaga gctagggttt 66180 

ggtgctagct gggtgctctg gctactggga tggctgtgtg atgcactgcc cgtgctacat 66240 

agatttaccc attccactgc gtgcccatgc ttcagagcct cctgtgtgcc atggtgttta 66300 

tgtacagctt tctctgtcaa attaaaataa attaataata aaggaacaat gttgagaggg 663 60 

gagagagaga gagagaggat gtggatagat attttggggt tttttttctg ttggttgttt 66420 

tgtttcaaga cagaggctca tgtttcctat gctagggttg aattcgctct gcagctaagg 66480 

tgagtcttga cgcccattga tcctgggtaa ctgcattcct tacataatct agatctcatt 66540 
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ggttttagga aagcaggaaa tacaaatgag cacatgttaa ataaaatcat atgttgtact 6660 0 

gtactctccc agacagcctt gtactgtact ctcccagatg cttttaaaag taacaatgct 66660 

aggccacatg tggtggtaca gccttttaat cctagcacac agaaggcaga acaggctggt 6672 0 

ctctgtgagt tcaaggccag tgtgatctac atagtgagcc ccaggccagc cagtgctgca 66780 

taataaagac cctgtcttaa agaaacgggg aggaagggga ggtgggagag gagggaggga 6684 0 

gggaagaagg aaaaagaaaa atagtacaaa ctgagtgtta tggtacatgg ctggaatcat 66900 

aacaccaagg aattttgcac ttttcttttt ataaatttac attgtgtgtg tgtgtgtgta 66960 

tgtgtgcaca tgtgtgtatg cacttgcctg tgtgtgctca cccatgtaaa catacatgca 67020 

aatgaccaga agaggctgtg gatcccctag agctggaatt acagttgtaa gctgcttgac 67080 

atgggagctg gcaaccaaac tggtgtcctc tacaagagat acatactctc ctaaccactg 67140 

agcctcccag cccacactcc aagtgagggt acacagtgtt tcttcttgtg tcacaggaaa 672 00 

ggaccaatgc tgagtgtttc agacccacaa aagccagaaa tgtcatgatt cagcctcata 672 60 

gttgcaaatc taagctgtcc tctcttcagc tatgtatgtg tgtgtgtatg tatgtatgta 67320 

tatgtggctt ttttatgaaa cctatcttta attggaaaag tctcatcttt gtgtctttat 673 8 0 

tcttatgagc ttgcacccat gaaggaagag tgggaccagg caaaaatgag tgagaagaca 67440 

aaacacacga tgggatgagt tacacagata aagtgaaata aggacatgag cctatgaaat 67500 

aagctggaga gacggcccgg cagctaaggg cactgggtgt tctttcagaa gacccaggtt 67560 

tggttccccc cctcaagaca tgacagcttc cagccatctg taaccctagt tctattggat 6762 0 

cacatgccct cctctgacct ccatgggctc tgcgtgcatt cagtgcacag atatgtgcag 67680 

gcaaacagct gtacacataa aataaagatt tgaatcaaag ccagaaatcc ctttaatccc 67740 

agctcttggg agacagaggc agggggatct ctgtgagttt gaggacaggc tagtctatat 67800 

agagaattca aagacaggaa tatgtagaga gatgtatgta tgtagtctca aacaaaacaa 67860 

aatagaattt atttttaaga tttattatat gtaagtacac tgtaactgtc ttcagacaca 67920 

acagaagagg gcatcagatc tcattacaga tggttgtgag ccaccatgtg gttgctggga 67980 

tttgaactca ggacctcggg taagagcagt cagtgctctt aaccactgag ccacctctcc 68 04 0 

agccccaaaa tagagttttt aatttaaaat atttttgaat taataggaag agggaagaaa 68100 

aaaaaaaaag agttctgtac aagggttaaa ggcagggcat caggaattca agccagccgt 68160 

ggctacatac aaccatgttt caaaaatcca aaaagaaata aacagatttt caaagtaaaa 68220 

attttaaagg actggagata catctctgtt gtccagtgct tatcaatcat atgcgggccc 68280 

cttgtactgc- cgaaataaaa taaaaacaac cagaaattaa aaacttgagg gaaggaagct 68340 

cttagaaggt gggtttggtt tgggtttttg agccagagtc taatgctacc tcggttgcct 68400 

ggcagctttc ccacaagagt catatggagg tactcaatgc ccccacctct gtcagtcatg 684 60 

acaagtctac cagggacttt gaggagggtg ctcaaagtgg ttatttgttt gtttagctaa 68520 

ttagttttgt aatcacagta atcaaaccca aggacctatg catgtcaggc aagtggtagc 68580 

caaggaactc actcctcatc ctcctgcctc tacctcccaa gtgctggtta tggcgtctgc 68640 

cagtaggcct agctccaggg aagggcttct caaaacaaaa caaaaacact ccaatagcaa 6 8700 

aaaatgacaa atgggatttt ataaaattaa aaaactctga ccagcaaggg aaacagttag 68760 

cagagtgaag agacagaatg ggagaaagtc tttgccagtc acacactgac aggggatcaa 68 82 0 

tgccagtgat ggcggcacac ccgcgtttat cagagcgaca tttgttatag ccaagttatg 68880 

gaactgacct tggtgcccaa cattgaagga agaaaatgtt cctgaacatc atgggaggag 68 940 

tattactcag ccataaagaa caacacgata ctatttacca aaaatggatg ggactggggg 69000 

gtgagggggt acaattaggg aagaaaggga ccaggacaga aacaaagagt aactgtggaa 69060 

tgatcaaaga gtgaatgaag aggttacaat gaatagatat gttaatatat gctcatttaa 69120 

aagtcaaaca tgttttagaa aataattctt ggaaagctac tagagatgtt accagaatga ■ 69180 

aggtgtaaac aaagagagag ggagacagaa tgagagagag aagggtccaa ggagagggac 69240 

aaaggagtag atgccagggt gctgaaggag gaaggtttcc aacaggcctc aaagtcagca 69300 

gacctagagg ctgtgggaaa gcacttactc ataggacaga cagcaaagtg cttagtcatg 69360 

tggggaatga gaggggagtc tgaatggagg aaagcacata gcagggggtt aggagggtag 69420 

cttagcatgc agtgtgaccc tccgtaccac caaaaaagaa aagaacatgg tcattgtaca 69480 

gttataaacc caggaaaggc caatataata tgccttttct catactgccc atactcaccc 69540 

catactcctc cctctaactc ctctcagacc caccctcacc ccctccctcc tcccaacttc 69600 

ctgtcatctc actgagtccc tgtgatggtt tgtatatgct tgggccaggg agtggcacta 69660 

ttaggaggtg tggctttgtt ggagtctgta ccctgtacca tcacagtggg tgtgggcttt 69720 

aataecctac tcttagctgc ctggaagtga gtattctgct agcagccttc agatgaagat 69780 

gtaaaactca ctccttctgc accatgcctg cctggctgct gccatgttcc tgccttgggt 69840 

gataatagac gtaacctctg aacctgaaag ctagccccag ttaaatgttg tccttataag 69900 

agttgccttg gtcatggtgt ctgctcatag cagtaaaacc ctaactaaga cagaaattgg 69960 

tatgtgctcc tgggtatgag accatcccct gggacctggc actacctgga gctacatcct 70020 

taagaaactt gactttctct ccccagcagt catctgccaa tagttcctca gctaggggag 7 0 080 

gggctcatgg gcccaactca gggaaacgtt tttaagttac ataggaaaaa gaacaaccag 70140 
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gactggtctg 
gctaaaatac 
taagaaacat 
gatggcacaa 
caaccacatg 
tgacagctac 
aaattagttg 
caacaacaac 
taattatgta 
ctggagctgg 
ggatcctctg 
tttttgatct 
gcaagacaga 
acagtctcat 
aatgagcccc 
ggagcaatgg 
gtttttctga 
tggtggagac 
taagtgacaa 
gtgctaccag 
ggaagctgaa 
tagttccagc 
ctcaggccca 
gatttaggag 
cttctgtgtg 
gcttcatcca 
caaatgggct 
ggatagaaga 
ccacttagga 
gtgctgggct 
ataggggaga 
tgcacagaga 
ggaggggggg 
ggacccacag 
tagcttatca 
tagatgaagg 
ttccaaagtg 
ggccagagat 
gctaaactct 
tatgtgaaca 
ggaggcacct 
tcctggcatg 
cacgggtcag 
acccccacca 
cagtggcagg 
cacatggtta 
cccattgctg 
gcaggtttca 
cactgtgata 
tgagggacac 
ctgtgaacag 
cttacaggtt 
aggcagacat 
gaagactggt 
cttcctccaa 
ctgagtcaag 
ccctattgac 
ataggagaaa 
tgcttagttt 
tgtgtggagg 



tcaggagcct 
aacaataaac 
gagataaagt 
cagttaagag 
gtggctcaca 
agtgtattca 
ccttcagggg 
aaatcttaaa 
tgagtgtatg 
agttccaggt 
aaagagcagt 
aaaaaaaaag 
gaaaaggtag 
ggtgtgtgga 
cagggcttag 
acagtggaca 
agtcctgcag 
ggagaaggag 
tagacatcta 
ggctggcgat 
catagctagg 
gtcccaggat 
gtgctgagct 
ccctttgggt 
acttctgttt 
atccagatgt 
gattaccaca 
acagatagtg 
aggtgaaaca 
tcagggaaga 
ttgaagttgg 
agtcggtaat 
ctgtgtgtgt 
gctgatgact 
acaaaggggt 
ccttccccat 
ttattggttg 
taaatacctt 
cagggcctct 
acatggggaa 
accatcttaa 
gtgaactgtc 
acattggaga 
ggttcccaag 
catgagccga 
ggcagcattc 
caagctcagt 
ataaagaagc 
tttgcaatga 
caggcatccg 
acacaatgac 
cagaggttca 
ggtgcaggag 
tttcaggctg 
taaggccaca 
cattcatagg 
gacagccctc 
tacatgtaag 
tatttaaaaa 
tcagaggaca 



tgtcgtggtc 
caggtggtca 
taaaagagtt 
cactgactgc 
accatccata 
tatacataaa 
gcaggacatt 
gaataatttg 
cctgggtatg 
ggttgtgagc 
acatgttcct 
ttgaaagcta 
ggaggtgaag 
gcaaagctaa 
cccaggtatg 
tagcggataa 
ctcattaagt 
aaaatcctga 
atgtgaggtg 
cattccctcc 
taagaaccag 
cacaagtact 
ggccaggatc 
aagctctgga 
ttaaactctg 
gtggtctctg 
cagaggcctt 
tgggtccaca 
caggagtttg 
accaagacac 
ggtttcctga 
gaagtgtcag 
gtggctgagt 
gggactagcc 
gcagatggac 
gctccccatg 
ttcatcatgg 
cttcaagaag 
aggtacctac 
tatggtatgt 
gtgtgagttt 
ccacaagata 
caagctaagg 
gcatctactt 
cgggaacagt 
taggcttcag 
ggggttgaga 
tcttgttatg 
tgaggttgaa 
gcctttcata 
caaggcaact 
gtccattatc 
gagctgagag 
ctagaatgaa 
cctatcccaa 
actgcttgaa 
tagactaaga 
agatcaggga 
aattatttta 
acctgcagga 



acagtgacat 
tgcatgcctt 
cattgaaaaa 
tcttccagag 
atgaggtctg 
ataaatatat 
ggagggagca 
attcattttt 
tgaggtgctc 
tgcctgatgt 
agcfcgctcag 
caaattttct 
tggcatggaa 
ctctagctgg 

tgggggccag 

agaaacaaat 
cccagaagtt 
ggaaggaata 
aaaaggatca 
ctgggttcta 
gctctgggcc 
atcttccagg 
ctagccttta 
tttaagacct 
ccctgctctg 
catcttgatt 
gctgttcacg 
tgctgcccca 
cctgtgctta 
cattctgggg 
cttccgggcc 
ggattcatgg. 
tgcaaggaat 
cagggggcca 
tttctatggg 
gcagttctca 
aaaatggctg 
gaatgtcggt 
ctcattagta 
gttccaggcc 
ctggagtctt 
gggcacatag 
gaaggctccc 
tcactcatcc 
agcaagagag 
gagcataggt 
cactgaccct 
tcatgatgag 
tcatatgaaa 
ttggcacttg 
cttgtaagga 
atcaaggcag 
ttctacacct 
ggtcttaagg 
caaagccaca 
attttgtgtg 
gagagacgag 
ggcccttttc 
tttggaggtg 
gttacttccc 



aagcattgag 
ttatttatgt 
cttgaaatta 
gtcctgagtt 
acaccctctt 
cttttaaaaa 
gagtggaaag 
aattttatcc 
caagagacca 
aggtgctagg 
ccgcctctcc 
gtgaattttt 
ggagaaggag 
ttctgggtac 
attggggctt 
gcatgaatcc 
tctaaacaag 
tgtttttgct 
tgggccttga 
cagcaacctt 
agctgggact 
aacaaggaga 
ttaacagtgt 
gaaccagtgc 
ctaaagtaac 
gctcacctca 
ctcagtgagg 
cctctggggt 
tcccatccta 
gtaggaagat 
tccagttgct 
gcgacacagc 
tgagagcaga 
ggggaatggg 
agacttaagc 
atgaaagaga 
cttaccaact 
tcaagaaggg 
tagagaattc 
aggaatctgg 
ctgttctcca 
cttagcatgt 
agtacagtct 
tgggagggaa 
tacaga'aagt 
cccaaactgg 
tgagccatcg 
ctggtatctt 
ttgctggctt 
tcatagttag 
caacatttaa 
gaacatggta 
tcatctgaag 
ctcacactca 
cctccaaata 
ttctcattta 
tctcatgctc 
ctaacatgcc 
gatgcatgtg 
tctctaccat 



tgtttatgta 
ttctattttt 
gggctggaga 
caattcccaa 
ct ggtgtgtc 
aaaaaccttg 
atactattgc 
ttattatttt 
aaggagtcgc 
aattgaactt 
agcccctcaa 
cacaaattaa 
aaaaggatgg 
cacactttaa 
gggtgcctcg 
ataaacttct 
ttggtgattt 
gactctaagg 
ggaccgtgct 
ttcattcctg 
tgcagagttt 
cactccttgt 
ctcttggcga 
tcccctgtca 
ccctggatgt 
gaagtcccct 
cgctaaaagt 
gtgggacatg 
ctgccatcag 
gcaatctaag 
tctggagaac 
gtgataaggg 
gttcagggag 
aactctgggt 
ttcggctcta 
cagtccttgc 
tctctgggtg 
acactttgtg 
tgttttgggc 
cagggagcag 
ttaccaagtt 
gtggctgact 
catagaggta 
tatgaaggta 
agagaatgtt 
ctatttctgt 
ctggttaaag 
tagagtgggc 
tgaaaccagg 
ggttttactt 
ttggggctgg 
gtgtctaagc 
gccactagga 
cagtgacaca 
gtgctattcc 
aagtcagcca 
cgctgtgagc 
ctttggcctt 
cctcagtgca 
atgagtccca 



70200 
70260 
70320 
70380 
70440 
70500 
70560 
70620 
70680 
70740 
70800 
70860 
70920 
70980 
71040 
71100 
71160 
71220 
71280 
71340 
• 71400 
71460 
71520 
71580 
71640 
71700 
71760 
71820 
71880 
71940 
72000 
72060 
72120 
72180 
72240 
72300 
72360 
72420 
72480 
72540 
72600 
72660 
72720 
72780 
72840 
72900 
72960 
73020 
73080 
73140 
73200 
73260 
73320 
73380 
73440 
73500 
73560 
73620 
73680 
73740 
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aggattgaac 
ttctctacat 
ttgcaaccaa 
ttctggactt 
ttgtttgttt 
tgttctgtct 
ttcttacctc 
taaagattta 
gagggagtca 
ctccggacct 
tttttttttt 
aaattatgta 
ctctgtgtgt 
catgctccag 
ccttgcagtg 
ccgggtgggt 
cacacacaca 
agaggccagt 
aaaagagaag 
gagcctgaca 
ctccatccta 
agggggcacc 
ttctagagaa 
gtgcacccca 
ggaataaaga 
tcagtggatg 
taagtttaag 
gtaagtacac 
tggttgtgag 
gggtgctctt 
gtagttaaaa 
gtaaatatgt 
aaaatatata 
aatatatata 
ttgagctaga 

at ggggcata 

ttagtctatc 
acctttctag 
catcccattg 
atggaggcac 
gagccccccg 
tgtctctctg 
cccatactta 
actcattttt 
cacctcttca 
tcctggctgc 
ctcctgagtg 
tgtctggcaa 
ctttgaaatg 
tgtaaggtgg 
actatgtgat 
ctttgactca 
tcacgtcctg 
tcccccatga 
tggagctact 
gctggccagg 
ggatacttcc 
tgcattgagg 
ctgcttaaaa 
gaggcagagg 



tccattgcat 
tttctctcac 
aatagttaaa 
ccccttccgg 
ttatggtttg 
agcctggtgc 
agccttccaa 
tttattaatt 
gatctcgtta 
ttggaagagc 
tttattgcaa 
ttgtggccac 
gtgtgtatat 
tgctctaatg 
gaacagctgg 
gcgtctctca 
cacacacaca 
gtgcagggag 
gcttctgtca 
tgatcaccat 
ccagctctac 
tctatctctc 
ttctaccttt 
aaagtagcca 
agtctggtcc 
agtagatgga 
tgccatataa 
tgtagctgtc 
ccaccatgtg 
accaatgagc 
gtaaatgaac 
aagtgaaatg 
ttggattaaa 
atagtacatt 
attaaatttt 
cttactctaa 
attaactcct 
ttctcttgtc 
ctgcccttcc 
taggcaggag 

gggggggggg 

ttggctgagg 
gggtagctcc 
caaaacggaa 
tctcagttct 
acggtagaag 
acaaacgggg 
cctgttcagt 
taatgttatc 
tatcagcact 
tctcctcttc 
ttcggtgttg 
gtttctctcc 
catctatgtg 
caggcactcc 
cctggggtgg 
aaagatgtat 
tggcaccctt 
gaataatatc 
caggcggatt 



agtgtatgtc 
caaataagta 
agtttaaaag 
tcggcttaag 
ccttggtttg 
tcactattta 
gtgctggaat 
atatgtaagt 
cggatggttg 
agtcgggtgc 
tggtgataca 
ctttctgtgg 
atatatacat 
tatcttggtg 
gtccaagaag 
tttatacaca 
aagacctttc 
gtggaggttc 
actcctgcag 
cttcattggc 
ttgggtccac 
agccatagtc 
gtcagaggat 
aagtagctta 
ttgcttatct 
tggatgggta 
ataatatata 
ttcatacaca 
gttgctggga 
catctcgcca 
aaaatatgta 
atcatttggt 
tatatatcat 
atactaattg 
taaaatgtcc 
aaatcaccca 
caggcactca 
cgtctgtgaa 
agcagcaaac 
ccctgctgcg 

gggagtttgg 

gacatagata 
agaagtagat 
gctgagacaa 
tagctgactt 
cctccatccc 
ccagatttca 
tagatcaggg 
tttacaacaa 
gagtcctcca 
ccctgttttg 
ggttttgctg 
aaggggcagg 
attggcaccc 
ctgcaagaag 
gaagacagca 
aggattgctc 
acaagttgat 
gccgggcgtg 
tctgagtttg 



ttcactgcgg 
attcagtctt 
aaaagcagca 
ggattattgt 
ggctttcgtg 
ccacagattg 
tacatggcca 
acattgtagc 
tgagccacca 
tcttacccac 
cattccattt 
ataaagtaat 
acatatattt 
cagttcctca 
ccatatactc 
cacacacaca 
agtatttttt 
tgctaagtgt 
cagatgaaga 
acttggaaca 
cttcctgcct 
ctggtctcaa 
gatgaacaag 
ctctgcagtt 
agggaggaag 
agtggattga 
aggttttttt 
ccagaagagg 
tttgaactct 
gcccaatata 
tgctaaatat 
ataagtactc 
atataacaat 
tcaattatac 
agttaaatct 
gccaccagaa 
ccctcaggtg 
gagttctttt 
acatgggggc 
catggcaggt 
cagagatcat 
gatctcctca 
tacccatcat 
gttgcagccc 
ataactttcc 
ggacatgtca 
tgtctcactg 
ctccagagag 
gatctcactg 
tctagggcat 
aatcccccta 
ttgaaggtaa 
gaaagacacg 
aggaggatcc 
tcaccagcat 
gactctttca 

aggggtaccc 
a gggtcctga 

gtggtgcacg 
aggccagcct 



agccattgtg 
acagaaataa 
cggccttcac 
cgtcatcgtc 
agacaggatt 
gccttgaact 
tgccccacac 
tgtcttcaga 
tgtggttgct 
tgagccatct 
tgccaagtgg 
tctacaccat 



gtgagtccct 
ttttttgctt 
atagctttga 
cacacacaca 
taagctgacc 
gtctgtcctt 
acaagcattc 
tgggtgggtc 
cagcttctac 
tcccatctat 
taaaataggc 
taccaagggg 
tgaggggagg 
tggatagatc 
taaagattta 
gaggcagatc 
ggaccttcgg 
taagttttta 
aatggtagca 
aacaataaaa 



•taccacatat 
taaatgtttt 
gaatctcaga 
attcaaattt 
ttcaggcatg 
ggttgaacat 
gcccttccag 
ctccttatga 
tatgtggatt 
gggtgtgggg 
tccttgaggg 
cctcctgttt 
tccaggaagc 
ctcacccctg 
ccaccctaga 
gatctcccag 
agctctgccc 
ttgcccccat 
atgcccttga 
tgcaccccct 
ggacgactct 
ccttggagag 
gacatttaaa 
agcattccag 
cactttcaga 
gtccgccatc 
cctttaatcc 
ggtctacaaa 



ctggccccgt 
aataaaaacg 
tgctaaccgg 
gtcatcgtcg - 
tttgcaatat 
cacggcagtc 
tttttttttt 
cactccagaa 
gggatttgaa 
caccagccct 
cattttataa 
aatgttaaaa 
tacctgtgaa 
caaatcataa 
atgcatctct 
cacacacaca 
acatctttgt 
cctccctcag 
ggagcagcca 
cctgtgcccc 
aagtggcaca 
acaacctctt 
tttaggatta 
ccccaggcct 
gatagagaca 
ctagtggccc 
tttattatat 
ttgttatgga 
aagagcagtc 
gcatatatgt 
taatcatatg 
gctaaggtac 
tattatacta 
gggatatata 
caaccattgt 
aatggatatc 
tggcacaagg 
ggcgggaggt 
gagaagtcac 
gtttcactta 
attggcttct 
agaatatcta 
agctgcttcc 
ctctgttgag 
ctttggtgtt 
aaaatgtaaa 
gcctaggaca 
gctttccatc 
atagctaaca 
gatttttgta 
cctgacctgt 
cccaagaaga 
gctgactaca 
aaggagtggc 
acagtgagca 
aagtcagaca 
gccacagatg 
ttccctactc 
cagcactcgg 
gtgaattcca 



73800 

73860 

73920 

73980 

74040 

74100 

74160 

74220 

74280 

74340 

74400 

74460 

74520 

74580 

74640 

74700 

74760 

74820 

74880 

74940 

75000 

75060 

75120 

75180 

75240 

75300 

75360 

75420 

75480 

75540 

75600 

75660. 

75720 

75780 

75840 

75900 

75960 

76020 

76080 

76140 

76200 

76260 

76320 

76380 

76440 

76500 

76560 

76620 

76680 

76740 

76800 

76860 

76920 

76980 

77040 

77100 

77160 

77220 

77280 

77340 
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gggcagccag 
tcctagatgc 
attgttggaa 
ggaaccgggg 
tctcccacca 
ttgccaagcc 
gcatcgccaa 
ctcctccttt 
tctctagcaa 
ggcaatctgg 
aagctttcag 
cttgtgtgtg 
tgtatgtcim 
nnnnnnnnnn 
atgtgcgttt 
gtgtgtatct 
tcactcagtt 
ccttgtccct 
tagcattctc 
actggcatgt 
ctttctcctt 
aagcctgcct 
ggagcagtgg 
ttgacttctg 
acagccttag 
cgtctgagca 
gaggacagaa 
tttctaaagg 
gtgcctctgt 
attcctgcat 
accactggga 
actgtcagat 
. ttaggagaaa 
taagcccatt 
accgcgtgga 
agaggctctc 
caggctaagg 
ctttcttcat 
cttgtcactg 
ggaagcacag 
tggatccaag 
atgagtctag 

gtggggtgga 

cccacgcagg 
agcaggagct 
ggagattgca 
tctccatatg 
cagccagctc 
aatgaacagg 
tgccacagct 
gtagacagaa 
ggcaccaatt 
gagccaggga 
aaagagccag 
gtgtttaaaa 
gctaattatg 
agtcaggaat 
ctaacctggc 
gcagtattca 
cttcctgcac 



ggctatacag 
attttcagtc 
gttccccctg 
tgcatgttca 
ccaccaccag 
agagcatgag 
caccctgggt 
tggcaatgtc 
atccagaagg 
ctaaaatgtg 
gtgacatagt 
tccgtatttt 
nnnnnnnnnn 
nnnnnnnnnn 
gtgtctgtgt 

ggggtgaaca 

gggggaggag 
ctctgctcag 
cttcctttct 
cccacccccg 
gctaacagca 
gcctgcctga 
gagtgtcctt 
gaagtgaaaa 
cctcgcacac 
aagacaactg 
aacatgttga 
gaaataccac 
catgaccata 
ggaatctaga 
cagagcttga 
caaggacagt 
tcaaaactat 
taacatcacc, 
gctgcccact 
aggagtcggc 
tagcagctct 
agttcccttc 
tcactgtctt 
acatcttagg 
gcattgcttg 
aacccacaca 
acagtgagtg 
gactttaatg 
gaccctgttt 
catcattcat 
acccatgtcc 
accatgcctt 
caggtaggag 
ccacgctcac 
gagagaactg 
cttgtaatta 
ttccccacca 
agaaagtaat 
gctgatgtat 
gtgatggtgc 
cagcatccca 
atgtccccgt 
gaccttctgg 
tttggtgagg 



agaaaaaacc 
ccagttctca 
aaagctagga 
gggttccaaa 
gttgccatcc 
aatcggatca 
gagcatagag 
ccaggataaa 
ttttccttgc 
ggttctgtct 
gccagccagt 

ggggtgtgta 

nnnnnnnnnn 
nnnnnnnnnn 
atgtctctct 
tttttgatcc 
gcacctccca 
tatcatggcc 
cctatgtaga 
ctgacctatc 
ttagtaccct 
cattctgctt 
catgttcaat 
aaagctcagg 
cctgatgctt 
tagtacagta 
tgtcattatc 
tcaggagcat 
accagctatt 
ctctgtgttc 
gtctaagtct 
acccatgcta 
atgaacatcc 
caccgcttca 
tgggtaagga 
tgtggacaga 
tcgcaggatc 
tgcttgtgcc 
tccagtcagc 
gggccagctc 
cctgcctaca 
gttgttttgc 
cccttctgtc 
tattcccaca 
tatagtcaaa 
agctgaggtt 
cttggtctcc 
cctcactagg 
catgagaaga 
cagggcagaa 
ctggagccca 
ggctgagtgg 
ccactagtgg 
ggagaaatac 
aaaaacaaca 
agcccttcta 
gacggatgtg 
tccaggaggc 
cccacgacca 
gcacactgtc 



aaaaagaaaa 
tctctgaggt 
aatacagaca 
ggtgtacctc 
acaccctctg 
gccatatctg 
ggaaagccat 
ctgtgagagt 
aaaaactcat 
cagtaagtct 
ccagggatcc 
tatgtgggtg 
nnnnnnnnnn 
nnnnnnnnnn 
gtgtgtgtct 
cacaggagtg 
aatcccaagg 
cccaacatcc 
aaatcacaag 
cactctcctc 
ttacctcacc 
cttctctttc 
ggaacctcct 
taatgggagc 
gcccagcctc 
gagacctcag 
aacctagtga 
cctatagctc 
tcaattgctg 
ccagttgggt 
ctgtttccca 
gatgtgctag 
tgcggttcct 
cccacctctt 
gactccacct 
aaatcaaaga 
tcaatacagc 
tcctttgccc 
cagccagcca 
acttatctca 
ggaaagggac 
atgctcataa 
tagaaatcag 
ctcgagtect 
cacacagaga 
ggacctcagt 
tttgggcttc 
gacccccagg 
gtttccatcc 
gcccagagcc 
aggccagtga 
ttacaaagag 
acttaacact 
tacctgtctc 
gatctacatg 
ggctttcctt 
gtcattggga 
agaggccatc 
actgctcctg 
tctcctttgc 



gaaaaagaat 
gctttgtctc 
gggtgtctta 
agtcctgtgt 
gaacattcgc 
cactgacaac 
tcctgtgcat 
cctgtcctgg 
ccagggtcta 
cagatggttt 
cacttggaga 
tttatgtgcg 
nnnnnnnnnn 
nnnnnnnnat 
atgtgggtgt 
gatcctgagt 
ctccaggctg 
ccatagccag 
gctgctacca 
tcttcccttg 
tcagcctctg 
ttcctccctc 

tggggttcgt 

cattccctcc. 
aggaccttct 
gtatatgtga 
acccctttga 
aaagctcact 
atctgtttcc 
ttcccagaca 
cctgttgcct 
atggcccttg- 
ggccctggga 
ctggcttggg 
tgggtcgagc 
gacatgtgaa 
tcctgtcatg 
caggacttct 
gcagcagctg 
acaaaccatg 
caacagatgt 
aagttgaagc 
agttagcagc 
gggagggaca 
ggctagggaa 
tgtcccttat 
cagccaccat 
ggacatcttt 
taagcctcct 
tggctcaaag 
attccccagt 
ggtgatgctc 
agatgccttt 
atgaagtatt 
caaatttggg 
atccttataa 
agtgtggtga 
atccagaaga 
gagaggaagg 
atcttttcct 



aatatttact 
atttctaggc 
ctcccagggt 
tgcaacacca 
atagtggtgc 
gtgaagacag 
gctcctcctc 
gatcccttcg 
taccagctct 
ccattctgac 
agtgtgtgta 
tctgtgtctg 
nnnnnnnnnn 
gtgtgtgttt 
acatgtacat 
agatagtcat 
ctccacccct 
aaacagatga 
cctgccagtc 
gtctgcttct 
acecctcggc 
aggaaacaag 
caacagccac 
atgcacccag 
ggacatctgc 
ctcttgtctg 
gcctggttca 
ttgtctgcat 
taatgggtag 
cactcaggtg 
ctgacttcct 
• ttttcttgaa 
gacaagaagc 
gatctcaact 
cataggggac 
cactcagaac 
gatcctctgc 
atggctgccc 
ggcttggtca 
gggtcactgg 
gaccacaggg 
aggagacagc 
ctgcttccct 
aaggaagcaa 
gcggctgaga 
cccaagcatc 
cacagctctg 
cctccctggg 
gaggaaggcc 
ccacagagaa 
tctgcactgt 
agaagcctca 
gttcccacag 
gctaacaatg 
ctagtctgta 
agtgattgcg 
agaacacgat 
tcaagcaaca 
accagaaggt 
tccatcttct 



77400 

77460 

77520 

77580 

77S40 

77700 

77760 

77820 

77880 

77940 

78000 

78060 

78120 

78180 

78240 

78300 

78360 

78420 

78480 

78540 

78600 

78660 

78720 

78780 

78840 

78900 

78960 

79020 

79080 

79140 

79200 

79260 

79320 

79380 

79440 

79500 

79560 

79620 

79680 

79740 

79800 

79860 

79920 

79980 

80040 

80100 

80160 

80220 

80280 

80340 

80400 

80460 

80520 

80580 

80640 

80700 

80760 

80820 

80880 

80940 
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ctctacctgg acatcctgac caaggacagg gctgtgtctg cagaggaggg aagctggagg 81000 

tgctagggaa gccaaactga tggccatgcg tctgtttcag aggaggaaga gatcaccttc 8X060 

gcccccacct atcgatttga aagactgacc cgggacaagt atgcatacac gaagcagaaa 81120 

gcaacagggg tgagtcctcc cagaagccac tctcctgccc tgtcccacct cctttaccca 81180 

actcactatt ccatggtggt tcctagaaag tgggaagtat cctacctcac cagacaacag 81240 

caaaaacaaa agtcagaatg cacatacagg ggccagggat ggaactcagt catagagtgt 8130 0 

ttgcttagtg tgcacaaagc cctgggttcc atctcagcat caagtccagc atggcaccat 813 60 

ctatctttga tcccaatact caggaactag aagtaggaag caagaagatc aggggttcaa 8142 0 

ggtcatcctt gactacaatg aatttgaggc cagtctgagc tgcacgagac tttgtcctcc 814 80 

caccccccaa aagaaaactt tcacatgctg acttatatct tggtacaaaa ctgccaccaa 81540 

cttgtacatt aaaaaataat ccaaaaagct gaattaacaa tgaccctatt ctaagatgct 81600 

gagagaattt taaagcattg tttcttttgt tttgttttgt tttgttttgt tttgttttgt 81660 

• tttgttttgt tttgtttgaa acagggtctc actatgaaaa tcctggctgt cctggaactt 81720 

actatgtaga ccaggctggc ctcaaactca cagagatcca tttgcctctg ccttctgagt 81780 

gctggcatta aaggcatgca ctactatgcc tggttactag taaggataat tttatgtgta 81840 

ttagtggttt tgcctacatg agcatctggt gcccacagaa gctgaaaggg ggtgtcagat 81900 

ccccagaaac agaatcnmm nnnnnnnnnn nrmrmnnnnn nnnnnnnnnn nnnnnnnnnn 81960 

nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnntgca 82 02 0 

agacaaccag agatttaagt actgagccct ctctaaaacc cttttaaaag acaaaaacaa 82080 

agatacacac acacacacac acacacacac acacacacac acacacacac accccacaca 8214 0 

cacagttctg ttagctgaaa ctaaacaggt tctcatgcat ctgagtgtgg tggtgcatac 82200 

ttgtaatccc aatacctggg aggcttaagc aggaagatca taggttcaag gccagcctgg 82260 

actacatggc tagactccaa aacaccacca gcaaaactga tttctttgct tcaggactta 82320 

gaatttcaaa acattgtagt tctccattga taaagccaca ggccaaagta aaccggtcac 823 80 

agagtgactg ttaatatgtt aataaggcca ctcatgttga aaggaataat gttfcggggaa 82440 

agatgcaata aaatactctt gcagaggacc tgggttcagt tcccagcacc aacatggctg 82500 

ctcacaacca actgtaaccc caactccagg ggtctaacac tctcttctga cctttgtggg 82560 

cactgcacac atgtgttgca catgcatata tgcaaacaaa gcactcacac acataaaata a2S20 

aataaatata ttttaaatag tttagcacta tctgagggta aattccctaa agaaccaatg 82 68 0 

acagctgaac ttgatgatgg ccgatgtcct cattataggg tggttttgtt tttaataata 82740 

acagagaaag cagcacatgt gctataaaac tgtcatgtta cattgcagat gaagtacaac 82800 

ttgccgtcct ggtgcgaccg agtcctctgg aagtcttacc cgctggtgca tgtggtqtgt 828.60 

eagtcctatg gtgagtggaa cacggtgggg tgcaggctag gttttgggtt ctgaggacag 82920 

tagcaagcca aggggcttca gtctgcttct ccataagatg agcgagtgtc ctgaaagagt 82980 

ttgtgcatct ctatccccct tgggctgcag tgtaataaat cccgctcaga gagagacact • 83040 

ttaactagaa aggacttttt gttgttttct ttttcttttt caagatttat ttattttata 8310 0 

tatatatata tatatatata tatatatata tatatatata tatgtatata catatgaggg 83160 

catcggatcc catgacagat ggttgtgagc caccatgtgg ttgctgggaa ttgaactcag 83220 

gaccttggac ctcaggaccc ctaatatcat aaacaggacc ttctggtact attcttttta 832 8 0 

agagaaccag actgagtttg agccatataa aagtgatatt ttatcaggta taaaacaagc 8334 0 

aactaccagg tattcacatg gttcagccta atgcatatca ttaagtatgc ccatgcagat 83400 

ttggagaggc ctaagaatat tttattgtgg ggctggagat atggctcagc agttaagagc 83460 

acttgctgat ctcccagagg acctcaggtt ggttcccagg acactcactg tgtggtccac 83520 

aacctcgaac ttcagctcca gaggatccaa ctgcctcttc tagatcagga gcatctacac 83580 

acatgcacat gcacacgcac acacacacac atactttaaa agacaaaagg aaatcttaaa 83640 

acacacacac acacacacac acacacacac acacacaaat gtccaggcca tggcactcaa 83700 

ccctcatccc accacaaacc acaggtgagc aagtcataag ccagagacct ctgataggca 83760 

gctcatgacc catgctgtcc agctgagccc tcatgcctac ccatccctga gagtagagaa 83820 

actggttggg aggatttggc cactgactag catccacctt atggtcccct ggagcccatt 83880 

taactccgat cccacagatg aaggtggttt ctgtcccctt catctgtatg tgtctgctgt 83 94 0 

tgccctggga tactttatca catgcattac aaccccctcc ccctgctctt gagctctgtc 84000 

tcccagaagt ggcctcacta gataaacttc atatagcacc tgtagaggtg tgacacagaa 84060 

gcatctctta cactcacagc cttggccaag gcactcttcc gccttcagaa cctgaaagtg 84120 

ggacaggagc tatatattgc tcaggggtag agaattttcc tgccatgcat aaggtcctgg 84180 

cttccattcc tagtacttcc caaaccaaac aaacaaaaac gccagaaaga tggtccatct 84240 

cccatccttt accaggactc cttcctcagt gtttcctgct tgtctgcccc caccaacacc 84300 

tgatctctcc agccctgtga ccctctttct agcactacac agagtcagtc atatggaaac 84360 

ctactcccca aatcctgcag ccctgagggc cctctcgatt tctgactcaa aaggtcctgc 8442 0 

caactatctc catcagtcag cgacctctgc cgagggccaa gtgataaagg ctcatagaat 844 8 0 

tgccactatc tctttagaca ccagccttgg ccfetcagaac tgctccatag cacaggtttt 8454 0 
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aagaaagggg aatgcgatgg atcaggaagt ggcaactaag attgaggaga gtaaacatca 84600 

cgaacattcc cagcatgtga gaggcaaaaa ggaaaaccaa ccttatatgt aaggctgtgc 84660 

catcttccac aggaaggctg aagatgtata tattttgcta atattctttt gagagccaga 84720 

ggacgcagta gaagggttta aggagttggc catagaggaa gccaagaagt tggggaaaga 847 8 0 

aaaaagaatt catatctaga atgcaagaaa gggggccctg ggacgggcgg tggtggcgca 8484 0 

cgcctttaat cccagcactt gggaggcaga gacaggcaaa tttctgagtt tgaggccagc 84900 

ctggtctaca aagtgagttc caggacagcc aagactatac agagaaaccc tgtctcggaa 84960 

aaacaaaaca aaacaaaaca aaacaaaaca aaacaaaaca aaaacaaaaa caaaaacaaa 8502 0 

aacaaaaaca agaaaggggg ccctgggttt tcttccaact ggttcataat cgggtgtatc 85080 

aactatctat cgctgcatag caaatgccct aaatcccagt tcttaaaaca ctgagtcctg 85140 

ggctggagaa atggctcaga ggttaagagc ctgagttcaa ttcccagcaa tcacatcaca 85200 

tggtggctca caaccatctg taatgggatc tgatgccctc ttctggtgtg tctgaagaca 85260 

actacagtgt actcatatgc ataaaaataa ataaacaaat ctaaaacaac aacaaaccac 8532 0 

tgagtccttt gctcatgatc ctttggctgc actgaattca gacagctcat ttcttctctt 85380 

tggtgtcagg tggacacact gaagtatttg atcagatctg gggttggctg actgtcagca 8544 0 

agggtgatgg cgatgcccag gaggcaagtt caggcacctt tgcttcttgg actcagggtc 85500 

cccagactag gtctcagctt gtccctacag tgagtttgtc atccacacct tggtcaaaag 85560 

gtctcctcta tccctactta gaaatgcctt ctctcccttg caggcagtac cagtgacatc 85620 

atgacgagtg accacagccc tgtctttgcc acgtttgaag caggagtcac atctcaattc 85 68 0 

gtctccaaga atggtaagca atgggcaacg tcagcttttc ttgttttcct caaagacaag 85740 

gggcctaggg catttgtcat ctggttgcag ctaccaattg tctgggttag atgtaggcct 85800 

atcctttcct tcccaaggcc catggtctgc cctacctgat ctcttcatgt tcaggccaca 85860 

tcacaatctt agaccagaaa accatatata tatagtgttt taagtaagtt tcttctggat 85920 

aagcaagtgc ttcaa'gtttc ttccaattgt ggtccaattt atttaattcc atctaagttt 85980 

aaaaacggac tctaattaaa gaaaatatta gtccagtgtt agtgtgggtg catagtaggt 8 604 0 

tgagatggtg aaagctatgt cggtgggatg tggatgaagg aggggaggga aactggaata 86100 

aactcatgct ttgagccaca ggtatggttg caaattgccc agcccagccc tggttgctga 86160 

gctacccttt ctgcagcaat ggctgggtgt gcctgttaac aagtccatcc ggtcctatct 86220 

taggtgacca ggaagccatg ctgtcatctc ctctacccac ttctgtgcag cagataactg 86280 

atttagggat gctgggtatg ggtaacccaa atacagacag agaacagctg tctcactcta 86340 

tggcctctgt gctgagggat gtagacgaag ctagacctch gtctttccaa ccattggatc 864 00 

aaacatctgt: ggatctgatg tgacctccct tctccaccca agtcttcaca cctgtccagt 86460 

ctccttccta catctggcca cctttatagc ctttaggctc caccccttcc tgctcatacc 86520 

tgtcccctcc tttgagtttt tcagcattga aagagtagcc tctatcacct ccctctttgt 86580 

cgggcttggt ttccttctgc tgggattcag cacacagcag gtgcttgaaa atattggtca 86640 

ccagtttgga gttttgatgg aacatttgtc taagcagagt ctggccaagg tgatgtgggg 86700 

tttagaagga gagaaagaag actaagaggc atggtggagg ttgctgtcaa gtggcttaga 86760 

atttatatga aaagatagaa cacttgctta aaaccagttc aagatcagtg tggtggtaca 86820 

cttaggtggt ggttctagcc cttaggaggc agaggcagca ggctcagaaa ttcagtttga 86880 

gtctgactaa actacataga gtttgaggcc agtttgggat acatataaca ccctgtccta 86940 

caaaacaaaa acaatgataa aaaacagatg ctgtgtaaat aaaataaatg tcatgtaaat 87000 

tagcatagtt ataaccatga tagctgaagg ccagtaagtc ccagattcca gagtagagac 87060 

tggttataac tgtgtctgag gaagtaagag caggtctccc tagctgtgga ctcaaaacag 8712 0 

cagatttggg tgttgagcct ctaactctct gccagaactt gtggggaaat ttggttgtat 8718 0 

tagtgcatgc ctttaaccct agcagtggag cttaagaact tgatgcagga agatcccaag 87240 

tttgaggcca gccttggcta catagtgaat tctagacaca cctgttctat atagcaaaaa 87300 

ggagagaggg acagagggag aaggaggaag taggaaggga aagacagaca gacaacagag 87360 

acaaagagac agacgggtgg ggggagggag gaagagaatt aattagttgg gagactagag 8742 0 

tgaggttgag aggttgcccg aggccacaca gattattgtc aaagctggga ttgtagccat 87480 

atcttgggtc cttgctgtct ttattcttcc tgacctgcac cctacatgaa cagcttgtga 87540 

cagaccagag aatctgtggc catcaacaga gatgctgagc ttgggtcagg aagctttcct 87600 

ttaagacaaa tctctccttc caacctggac atttcccatc tgtgggccct cagagggagc 87660 

ctggcagagt gggtgcacag agaggaaaac cccagaaaga agctggaacc tccatcttca 8772 0 

gttcaggtac agtatacaaa taaatcctag tctcccatgc tccaggtcct ggcactgtag 87780 

atagccaagg gcagatcgag tttcttgcat gctacgccac actgaagacc aagtcccaga 87840 

ctaagttcta cttggagttc cactcaagct gcttagagag taagtgcctt gtgaactacc 87900 

ctttggggag gttctttcac ataactacct ttccattgat aactggttga tccctcaccc 87960 

agttcattcg ttttctctct ctctctctct ctctctctct ctctctctct ctctctctct 88020 

ctctctctct ctctcataag aatccatgga tggacccctc agtgggttca gcaaaggtgg 88080 

ggtacctatg gggctaccca gtggctatgt ggccctttcc ccacccctgt ggtttgttct 88140 
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tctactgcct 

gtcccccctg 

accaagctgg 

cttgctcagt 

tagctgctac 

ctcttgtgtt 

ggtggcaacg 

acatactgtc 

gttcaaagag 

gcttgtagag 

ttcaatgcat 

cagctcagca 

tttaataaac 

aaattagaat 

gcttcccagt 

aggagtcaaa 

ggcaaatgac 

ttggaggcct 

ccttggagac 

ctaacaggtt 

gtggtacggt 

gagggctgca 

tgcaggcgag 

agtgtgcagc 

gcagtgcagg 

tgggcagctg 

tgcagctctg 

gttcatccag 

tcaagggact 

tttcagccct 

gagattgaat 

tgatttccct 

ctgtcactca 

tttgtgacac 

agaccttgga 

ctagaccttc 

accagggacc 

aactatctct 

agacctgagt 

tatgaatgag 

ttacagatgg 

agcagccagt 

aagactgctg 

tgtctcctct 

tcctgatcag 

ggacctcacc 

aagaaagcct 

gggacttttc 

taacctcctg 

taagaggcag 

gtgactcccc 

gcctgtgcca 

attagcccgt 

ttccttgtat 

tttcttgtac 

ttaataccat 

tatagagttc 

tgaaagggtt 

gtacagccat 

gctctctctc 



cagccctaga 
tccctgactt 
gtttgagttt 
gaaagccctg 
tgcctgcttg 
gtggggctca 
ttgagcactc 
tgagactcct 
tgatggggaa 
cccaggagag 
cccgaacctt 
gccagggtca 
tafccgagaca 
ttcaaaaggc 
cttcctgcct 
aaggaggggg 
catccttcaa 
cagagacatg 
ttttctgcct 
ttgtcaagag 
ttggagagac 
gcagtgcagg 
tattcagctg 
tacagcagtg 
ttggtggaca 
caacagtgca 
taggctagtg 
actgtagcca 
tagggtatgc 
gctgtgccta 
cgaggctcct 
gaacactttt 
ccaagccaca 
atttactcag 
ggggtatgag 
ccttggcctc 
tggaagctca 
cacttggctg 
ttgcttaaaa 
tgctctatct 
ttgtgagcca 
gctcttaacc 
acccccattt 
gcagctaaag 
cattaaatcc 
gtaccttagc 
gtctgccacc 
ctagtgggtt 
cttagccttg 
gttctgttcc 
ccaggtcaca 
ccgtgtctct 
ccctgctgcc 
agagatttta 
tgtttcaaac 
ttcttagatg 
tttttcctat 
tgttggcaat 
ctctctgagc 
ctttccccag 



gatagaggca 
gtccatcaac 
ccattaagct 
ggcaaagctc 
ctgttgccct 
ctgtgactgg 
accatgctta 
atgcaagaaa 
ggataggaac 
ctaccagcct 
cctgggatta 
aaacagttcc 
tttgcataga 
tgctatcttg 
gcggtgtgat 
aaatctcttt 
acaaaaacag 
agaagggaaa 
ggttctcact 
tcaggaagga 
tcttcccaag 
ttagtacgca 
cagcagtgca 
caggcgagtg 
gctgcagcag 
agcgaatatg 
tacagctgta 
tcccagacca 
ttttcatcat 
ctcccctttg 
gcaagcacta 
tagcagcagg 
tctctagcaa 
acagggagca 
gagagggatg 
tgggatatgc 
actcaaccca 
agcaatgggg 
gaacatattc 
gcatgtacat 
ccatgtggtt 
actaagcctc 
acctgtcctg 
cccattatct 
tctgacagtg 
ctcagcctct 
acccattgtg 
accatccctg 
ctgagtgagc 
ccagtgagta 
cagatagtcc 
cgttagtccc 
aggtcagctt 
gactgactta 
cttgggaagt 
gatttttcta 
gaggcttcag 
ccctgcccgg 
atgctctata 
gtgaaggctg 



gccactggtg 
attctctagg 
gggtttgagt 
cagaatgctg 
ggccctcccc 
gagcaaggta 
tgaaagatcc 
actatatgct 
caaccttgca 
tctaccccct 
tgccaccttc 
aactggaatg 
tttctatgga 
agtgtcgtgg 
gacttctttg 
gatttttcct 
aatccccaga 
gacagtcctc 
ctgtggtctg 
gagaatgaag 
gtaatccagg 
gcactgcagg 
ggtgagtatt 
tgcagctgca 
tgcaagttgg 
cagctgcagc 
gctggtctgc 
aagccataga 
ggatggatga 
gttattacct 
agcaagcacc 
aagttgaggg 
accacagaat 
ggcctatggt 
gtactgcaag 
aaagaatgga 
gtttaggaaa 
tccacctgct 
agttccgttt 
ctgcatgcca 
gctgggaatt 
tctccagctc 
acctgccctc 
ctgaccccga 
acgagtccta 
gaagtgaggg 
ccctgatgct 
ctccttgtca 
tgcaggtacc 
aataggtaaa 
tcaactgagc 
tagccagctc 
tccagtctgt 
cactcttacc 
aagctttccc 
ggatttaatt 
acatttgccc 
ccacttgctt 
tccgtctaat 
cattgccctt 



tccatgtccc 
cagaatccat 
ttccaccaag 
aagatctaac 
ttgccctttc 
aagatagaca 
tttctataga 
ttctaagtct 
aacagaggca 
cacacctttt 
accacaacat 
aggggctttg 
aacagcatat 
gaagtgggac 
ttgcctggga 
aaaaataatt 
aagcctgact 
cgtggtcgag 
caaacccagc 
agggaagtga 
aagaaaatgt 
ctagtgtgca 
cagctgcagc 
gtagtgcagg 
tgtgcagcag 
agtgcaggtt 
ccagagtaca 
gaccagctta 
cctcaatatt 
acttagtgtt 
caaccattga 
.cagtgctgac 
atttcaagcc 
gtcttctcta 
tctcatttca 
aggaagaatg 
tggggtgggg 
ctcttgggca 
tactatgtat 
gaagagggta 
gaatgcagga 
tcctgatgca 
acctgtggac 
gtacttactg 
tggtaagcat 
tgatggtggc 
gccacttcct 
cttctgggtg 
atgtaattta 
ctgaggctca 
tgtggcttga 
ctcagaccca 
ctccttttcc 
tgtttatctg 
tatagcaacc 
tttccctcag 
cagattgcta 
caaggagctg 
gccaagatgt 
cgcttggaga 



tgtcccctag 

ttgagtttcc 

ctgggttctt 

agggcctaaa 

catattgtgg 

gcccatttca 

acatctccag 

tcctgagatg 

cagactgtct 

tgcaccttcc 

cctgaagccc 

tgcgggtgtt 

taggaggctt 

gagctgtcat 

tgttgcaaag 

agactgtgtt 

cctaattggt 

gaaggcgacc 

ttcttcttct 

aggagagctg 

gcctggggca 

gctacagcag 

agtgcaggcg 

ctagtgggca 

tgcaggttag 

agtgtgcagc 

gctacagcag 

gacactgtag 

cctaagtggt 

cgcagcactt 

gctatgtccc 

atccaggagt 

caactggagg 

atgcagccat 

gacctggctg 

gaggcaatgg 

cagggaagaa 

tgaagagcca 

ttatttattt 

ttaggttcca 

cctctggaag 

tgcaaacacc 

tctaatctgg 

gaccagcata 

cccaggccag 

ttagcctttc 

ctgtcccaga 

ccacagccag 

atcttagcct 

tagaggtcaa 

gcctgagtct 

ggtgcagaac 

tttcagactt 

attttgttta 

agattaacat 

cattgatctg 

aggcatcgat 

tctcccagtg 

tcctattgat 

ccacagaggc 



88200 

88260 

88320 

88380 

88440 

88500 

88560 

88620 

88680 

88740 

88800 

88860 

88920 

88980 

89040 

89100 

89160 

89220 

89280 

89340 

89400 

89460 

89520 

89580 

89640 

89700 

89760 

89820 

89880 

89940 

90000 

90060 

90120 

90180 

90240 

90300 

90360 

90420 

90480 

90540 

90600 

90660 

90720 

90780 

90840 

90900 

90960 

91020 

91080 

91140 

91200 

91260 

91320 

91380 

91440 

91500 

91560 

91620 

91680 

91740 
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tcagcatcct atctacacgc ctctcaccca ccatggggag atgactggcc acttcagggg 91800 

agagattaag ctgcagacct cccagggcaa gatgagggag aagctctatg gtaggtcagc 91860 

cagcctcctc ctggctttcc cagaggccac tgcactaagg acatgttctt tctctcagca 9192 0 

aagctatcaa gtatctatga tctgtgcctt cagaatatag cgtggttaga aagtctctag 91980 

ggtagcgtgg tgatcctggc actcaggagg ctgaggcgag aagatcgtga gttcaaggct 92 04 0 

agtctgggtg acatactgag acgatgtctt taatgagaag aaaacaagtc atctcattta 92100 

tctctcacaa ccgccctgtg agtcaagtat ggttatctct acttggcaga tggagaaact 92160 

ggagctagat gactggctta atgttgcaaa ctcaggctaa gccaagttca tccagggcca 92220 

agcaggagct ggtagacata gtggttgctt gcctgtcatg aaaatagcaa gtgtaggaac 92280 

gcacaacatg gggtgcaggg gctccttatg ccagggaggg ctctaagggc cactgcttct 923 40 

gtgcatgtga ctctcctgtg ccccatgaga gatctttctt ttctctagca ggacagtggg 924 0 0 

ttgaacagat agtgggtcat tacctagcaa ccacattcca gaaataaagc ccggggttcc 92460 

tttaaactga tttatagggg ttcccataaa cagataagct ctgcatggga agtachctta 9252 0 

tagggtaacc taatagctgt tctaagtttg ttctagtctg acaagaaatg tctgtttbat 92580 

ccatggctaa gctaacaagg tatttcttga aataacaact gggtcataag tatctgttca 9264 0 

tctccttctg gacccatggc aggcatcatg gcaggcatca tggcaggcaa cagttgccaa 92700 

gcgtaaggat tttgaagcct gaaaatcttg cttgatatac ctctcccagt aaccactatt 92760 

tactgtccag ctttggcctg caaagagata ccacattgcc atttctaacc catgccgtgg 92 820 

ctaggatctc acatgttttc tttcctaaat gttctcttcc agactttgtg aagacagagc 92880 

gggatgaatc cagtggaatg aaatgcttga agaacctcac cagccatgac cctatgaggc 92 94 0 

aatgggagcc ttctggcagg tagacgaagc ttgctaagac ttattacagc tagctgggct 93 000 

gtgtgaacca gagtccagga gagggtaaag tggagttcag gaaagccaag gtcagaggag 93 060 

aagaaatgtt gtggcccagg ggacccactc tcctacctca gtttcacatc ctgagttcaa 93120 

atcctcatct ctgaaagatc aaagccttgg agaacatttc catgtaggaa gggctcagct 9318 0 

caactatcct ttaatgaaca cctgctatat tcaggaagac acagtgtaaa tacagtatag 93240 

tccatgacct acagacttaa caccagccag tggaaggatg tgcagtgcag gaagaaggag 933 00 

ctatagatga ctgaacaatt ccagggcaaa agatacttct cctgcattca gggaggccag 93360 

gaacataagt ggtcaaaagt caccacccat gacttcctag cttcatttct gttgctgtgg 93420 

caaatgccct gattaaaagg tagcatgggg agggaaggga tcatttgact tcctcaagcc 934 80 

cacattacag tctagtgttg tggagaagcc agggcaggaa cctgaagtat cacatcaaca 93 54 0 

gtcaagagca cagagaataa acacatgcac ccttgcttgt tctcagttag ttttcttcac 93600 

tcttacacag ttcaaggccc agccaaggaa atggtgccac ccacaatgga ctgagtcttc 93 660 

ctacattaat taacaatgaa gacaaghccc tatagacatg cccacaagcc aaacttatct 93 72 0 

agtcaattct tcataaagac tctcttccct ggtgatccta gattgtgtca agagggcatt 93780 

tcaaaccaac caacacagag gggtgtagcg gggcatgcct ttacttccag aactcaggac 93840 

gtagacgggc tgatcttgtg agttcacggc cagcgttgtc tacacagtga attccaggac 93900 

attcagagct acataatgag accctatctc aagaaacaaa ataaaacaaa acacagaaag 93 960 

actaaccaac acagtgacta acaggaaaga cggattggag agtaacctgg gcattccagt 9402 0 

ctatgttgtt tcctgatatc agaaacaccc ctgtttcctg atgtgctcgt gttagtcctc 94080 

ctgagatgga gcccattgtg gaacaaggtt tggtgtgtga ctacaaaggt tgcctcacct 9414 0 

ggctctggac tgtctcctcc agttttaaca cagctgccct ccttaccacc tctagcctca 94200 

ccctctgtgc cgcacactca ccctgctggt ctgatgcctt ccgtgggatg tgcaaaccaa 94260 

catgtcccac gtgactctcc cacagcccca ggcccaccag gttctcattc tgctctcccc 943 2 0 

atgtgctgac cttgtcttca gtgaacccca caacagggca gcctctatcc ctgaccacag 943 8 0 

ggccttagca tggacctttg cctccacaca gagcatcctc ctctcttttt ctctcttctg 94440 

tactccatat aaattcaatc cgcagacagc cttgagtcaa tcttctccag ccatgttcta 94500 

cctggtgggt ctccactctc ctacttgtct caacttgatc tcttatttct ctttgccata 945 60 

ttatacacgt accagaatga gtaaatgaat tgttgtgccc aagaaataaa ggtactgaga 9462 0 

ggctgagaag ctgaggaaca ttttcctcag gctggtgcca gccaagaagg gaagaggctt 94680 

caggtgtgcc tccacatctt cctacagggt ccctgcatgt ggtgtctcca gcctcaatga 94740 

gatgatcaat ccaaactaca ttggtatggg gccttttgga cagcccctgc atgggaaatc 948 0 0 

aaccctgtcc ccagatcagc aactcacagc ttggagttat gaccagctac ccaaagactc 94860 

ctccctgggg cctgggaggg gggagggtcc tccaacccct ccctcccaac cacctctgtc 9492 0 

gccaaagaag ttttcatctt ccacagccaa ccgaggtccc tgccccaggg tgcaagaggc 94980 

aaggtgagtg tcctctgaat tgtgtgtgtg tgtctgtctg agtgtctgtg tttetgctca 95040 

aaagcatccc ttgggggctc ccaagggtga cggcctgaag agggcagagt tgtagtaggt 95100 

tctgcccact actttggctt ctgcctgtcc aaaacagttg gtgcaacttg attttaaagg 95160 

ggactgggtt gggacttgcc aagaaagcca tcttcttata aaaactgcat ttactcacaa 95220 

agacatttct caaaccaatg acaatctatg ccatcctccc cttcctggag ttctcagtgg 952 80 

gaaagaggtg aggatttctg aatagacacc attatctacc ccaaatctct ctctttcctt 95340 
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aaaaacaagc attcatgttc tcatatataa ctatctgggg gctggagaga tgactcagtg 95400 

gttgagagca ctgactgctc tttcggaggt cctgagttca attcccagca accacatggt 95460 

ggctcacaac catctgtagt gagatctgat gccctcttct ggtgtgtctg agaagaatga 95520 

cagtgcactc acatacataa gtaagaaaat aactatctgg accctcctga tggaaacata 95580 

tattggaaac taggaaccca aatgaaggca gagctgtcat cctacagagg gagccggcca 9564 0 

gaacaggttt aggagcaagg accacacagc ccagagatga agtcctaagc agatatggga 95700 

aacattaggt gggaatccca ttcctacagg atgatagatg gccaagtgac atccagacct 95760 

agttaacagt gacacagatg tgtctcctcc ccagctgcct tgagcatttt gtggtgacac 95820 

cggctctcac" attcagcctc ttttctgagt gctgtttctt ttcttccctt ttcagaactt 95880 

cagttcctta gcagagctaa ctcatagtaa tcagggaccc gtgctggagc atcacccaca 95940 

gtgcggttct cctccagacc ctctaagtca aatgcctata ccaggcctgc ctggggattc 96000 

ctgtggagag gcactgacag tcctgtgcct tagctgttag ctgaactagc tgaaaggtgg 96060 

gagggcaggt cccttagcca gactgaagtc tacttcctag gagcaaggga gaaatcgcct 96120 

tggcatccct ccccggaaat gaggatcaag gtagccatcc agaaggtact gaggtacttg 96180 

tttgacaaag gcagtctctt tcgagacccc atgaagcaga gctagagaca gccacaaaga 9624 0 

aagcacaagt tcatggactc agaggttccc agagtggaag tcactgtgtg cttcacacgg 963 00 

tgaacagagc ttggtggaaa catgtcctct ccagccccag gtgacagcat aggggtagac 96360 

agctatcagg gagcgagcta caggactagg tggcaatagc cacccatccc aggttcccca 96420 

aggctgcccc aactttagca tttaaaagtc ccatctcctg gaaaacactg cagtcctggt 964 8 0 

aagcttggac caccaccatg tgaggtcaag gctgtgtgcc aagaaaggag actctgctca 96540 

ggccctagcc ggcatccatg gtctcctgca caagaactga ctctgccccc tggataagag 96600 

cttggtttcc tgttgcctat actcaaactt tttttttttt ttgactaaaa ctcgtgagtt 96660 

taatctgttt ttcctaacct ccattaggat agcaggtgta ccctgaacac tgctggaagg 96720 

atcactttct agcttctatc tcattggctc catcctctgg cctctctatc tctatagctg 96780 

agtggatagc cagatgccac acacatgcac aggcacgcac acgcacgcac atgggggggt 96840 

ggggagatgg ggggggagtt gcttgcttct gtctaccaca atgccctcag aggccagaag 96900 

aggacattgg attcccaaga actgcagtta catacagttg tgagttacta tgtagatgcc 96960 

aggagtcgaa cctggtaact ctggaagaag agcaaatact cttaatcaca gaaccaactc 97020 

ttcagcccgg catggtgttg catgccttta atcccagcac tcagcagaca gtgaggcggg 97080 

cggatctgtc agttctatat caaccaggga tacgtagtga gccctggtgt ttaaaaagaa 9714 0 

aaattctcaa ggcagaatcc atgtaaaaat tattccctgg aaaataagta attcggaggc 97200 

atttctgtgg tgcatgttat atgatcacta ctgtagttag ttccagagca ttcttatcac 97260 

ccctaaatga aacccagagg catgaagcaa ctactcccca tgtcctgact tctgaacata 97320 

tttcctcccc atctcctctc cttttcccca tggttccaga cctggggatc tgggaaaggt 973 8 0 

ggaagctctg ctccaggagg acctgctgct gacgaagccc gagatgtttg agaacccact 97440 

gtatggatcc gtgagttcct tccctaagct ggtgcccagg aaagagcagg agtctcccaa 97500 

gatgctgcgg aaggagcccc cgccctgtcc agacccagga atctcatcac ccagcatcgt 97560 

gctccccaaa gcccaagagg tggagagtgt caaggggaca agcaaacagg cccctgtgcc 97620 

tgtccttggc cccacacccc ggatccgctc ctttacctgt tcttcttctg ctgagggcag 97680 

aatgaccagt ggggacaaga gccaagggaa gcccaaggcc tcagccagtt cccaagcccc 9774 0 

agtgccagtc aagaggcctg tcaagccttc caggtcagaa atgagccagc agacaacacc 97800 

catcccagct ccacggccac ccctgccagt caagagtcct gctgtcctgc agctgcaaca 97860 

ttccaaaggc agagactacc gtgacaacac agaactcccc caccatggca agcaccgcca 9792 0 

agaggagggg ctgcttggca ggactgccat gcaggtatgt tgagctgtat gtatatgggt 97 98 0 

gtatatgcat atgtgtgcac gcatgcatct gtgtgtgtgt gtatgtgtat gtgtatcctg ■ 98040 

ggatgtgctc tgtgaagaag ggctaaacct ggggtttgta cctggcatct gttcctgttc 98100 

ctgactggtc ctacagggca tctgcagtga ttcttgagag cgacagagag gagctgttga 98160 

cacctgtaga gtaagggagt ctaagtcatt tatattttag tcttatcatc cttaattagg 98220 

gtatgcactt aaaacattcc tggtgtttta gagacctcag gcagaatatc cccttcctat 98280 

agaatagatt tattgtagag caaaactaac aatgacttaa atacacacac acacagagtt 98340 

gagactggac atgatagtac accgctataa tcccagcaca ctgtacatct agagcaagag 98400 

gattggaagt tcaaggcaag cctgggctaa atggtgagac agacaacagc agcagtgaga 98460 

gagctaaggt tatatagctc agtgggagag tgcacaggaa gcctccagaa agcccttagg 98520 

tgaatctcca gtatcacaag gaacagagtc tagaactaga ctgctggcca ttccattagc 98580 

atacttattg cagcattgag ggtcagactt ggggccttac actaggcaag cactgtacca 98640 

cagtcctgtg ttggatttaa atcccaagtc tccactgcac ggctctatgc ctttttaaac 98700 

tgtcaaaaag aaactactag tgccagccat tgtgtagatt tatggggaga aaaagattat 98760 

gccatatact gttattgttt gtgatcattt ttacagttat cactcatttt gaaactatat 98820 

acaacatata tctatctcat aggtactgtg tttgtaatat atgcatgaat taacatataa 98880 

ggcacttagg acagtgtggc tctttcacat taattagcct gacctaactt acagcactag 98940 
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gggtcaaatc tggcaaggcc ctgatagccc tttctgttct catcttacag tgagctgctg 
gtgatcggag cctggaggaa cagcacaaag cagacctgcg cctctctcag gatgcctctc 
tcaggatgcc tcttggagga cctcctgcta gctcttcttg cctagcttca agtcccaggc 
tgtgtatttt ttttcaggaa acggcctcac ttctctgtgg tccaagaagt gtgctgctgg 
ctgccacact gtgcggcaga tgctaaagct ggatgacaaa cgcacgccat acagacagca 
gacagcggca ctgggtctca gaacttggat tcctgggcct tcttccagtc gccgttttaa 
agaaaggaac taacggagct gctcatccga gggtgaagat ataaataata atattattaa 
taataataac agtcaggtgc catgtgctgt gttaagtgct ttatgaacat ttgtcgggct 
ggcctccagt gctgaggtgc cagtcagcct gaaccctatg cccaggccca ctaatcccaa 
atggtgggtc ctgagatgtt tttaaaaagc attaaagaaa accatcggtc tcttagagct 
aaccggccgg gctctactgc agggacccga acagtctgca tggctaagtg gcacaaggag 
cctggccctg tccagcttca gagatccaag ctgctttttg ctggggttct gtcacaggcc 
tgatcctctt ggtttttatg gggtttcaag tctgccagag tcagaaatca gctctaactc 
gccagtgaag agatctggcc ttaacttaag ccagccacgt caggcccctg ctgagcctat 
ggaccaataa atactccccg tgccactgga ggtgggcagc tatcaccata ccctgagttg 
ggccaagccc accccacccc taccctgcaa catttctgat gtactgagga agagtctcca 
ccatagtccc caagggctga gttctccagc ctgctatcag ggaaggtgag cattggtccc 
aggctctcaa aatagtgcag cctcttcttc ccaagctctg gggtgcaccc tgtgtccttg 
gttaccagga gactagggtt gtgatatctt ttcttgtctt gctttttgat atatcaggat 
taatgtagga aaccagacct agattattca ggagagtagg tatatcccct gtgtttccca 



<210> 2 

<211> 11928 

<212> DNA, 

<213> Mus musculus 

<400> 2 

gcttcccgag gagcacctga aagccatcca ggattatctg agcactcagc tcctcctgga 60 
ttccgacttt ttgaagacgg gctccagcaa cctccctcac ctgaagaagc tgatgtcact 12 0 
gctctgcaag gagctccatg ggtaacggag agccctgaga gaggggtggg ggagcttcat 180 
gggtaatggg agcccctacc cacccaggag gatggccaca. gcaagagaaa gtgctcatta 240 
gagtgaccct gggtctcctc tctgtccaga tgtctctgca gcactcacag taattggccc 300 
aggtggagtc tggaatgttc caggcttgtt ggaagctctt gctctcatag aatctgagct 360 
ctaactgagc tgggaaagtt gatcatttgt ttattccttt tagggtattg ggggggcacg 420 
gatgtatgca tgctggtatg tatgcatgct ggggatgtag ggcctcatgt gtgctaaata 4 80 
catgtgccta ttctgtaatg ttttcttgtt tgtttttaac tcaaattaat tagaggcagt 540 
ttctctgttt aaaaaaaaaa ggtggaagac actgccatct catttgtgtt gggacttgag 600 
atatatatat atatatatat atatatatat atatatatat aatttattta tttatttatt 660 
tgtgtatgtg tgtataagaa agcgggggac actcatgtgc catggtgtgt atggggacac 720 
acgtgtc'ctg tgggggaaca ttgtgtcagg ggaacaagca tgtgctgagg gtggggggat 780 
atatatgtgc cactgtggtg atttggttgg gggagacatg tatgtgctat gatacctgtg 840 
tgtaggacag aggacagact cacacaggag ccttcttacc tgctgagctg tctcagcagt 900 
ccaagccctc caggctgtag ccactcttcc tccttgctac agtcccacat ccagccaaca 960 
cataaaggct ctggcaggaa ataaattaaa tttgctttgt gtgtgggtgc tgacggctca 1020 
agtcttccgg tgatggtaat gtttaaagca agccaacatc agttctccag tgccgaagtt 108 0 
attgaatgac tgaccaatgg gtaacactta ggatttttaa aaattatttt gattgtacat 114 0 
tttgaattac attaatctta aaataaacta caaacataaa cacactgtgg ttggagctgg 12 00 
gcatagtggc acatgccttt aattccagaa cttgggaggc agaggctggc atatctctat 1260 
gactttgagg ccagcctggt ctatataatg agttccagga cagagagatc ctgtctcaca 1320 
aacagagaaa accccataac tatactttta ttgtactggt tggtctacag tgtaaagaat 13 8 0 
tggcaaagaa tatgaaagat tacactggga agaaactgaa agccatccag agtaatgaag 144 0 
caaacccctc acaaatgtgg gaaacatttc acatgggtga gggcttgcag ccagcctgtg 1500 
ctaatttatg ttttggtacg tgagcactta gagcagtttc cagttctctg gtgctctacc 1560 
aatcttagct tggttaatat tcagggatga gtcttccatc accggtaatc taatttgcca 1620 
ttctatttaa aggctcttaa aggcacaggc agtgcattgg gtaaatgtgg caaataattt 1680 
ctttgacaaa tcgaccaatt gtcagattgg cctgctagtc atttgtttca atgagaactg 1740 
ttttttctca aaggatgctc ttgtacaccg gctagaagca gggctgtcat ttttataggt 18 00 
ctctgtggta ttttgttgtt gttgttctgt tgttgcaaat catttatcac tgagggaaaa 1860 
tacacacaaa gggccctttc tttaaaagta tacatgtatc attttgtgac agctcataag 192 0 
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aagctgtttt tttctgcctg gacacaggtc ctgacctgtg ctgtgtcctt gctaagcttt 198 0 
gtcagaccct tccacagctc cccccaacaa cgagttcccc agtacctgcc tcacctcatc 2 04 0 
actatggtga cagcagcctc tgatgcgcct gactctctgg cacattatgg cagtgttaaa 2100 
agcttccatc tctctctttg ctgaataatg aacctcaggt tgttcaagag accggaatgt 2160 
tcttcacctg cctgcacaca tctcttcact ttcttttata gatcaggtag ggactgggcg 222 0 
tgtagatgga acaaactgtt ttccgttccc cagccatctc tgcaggtgca ctccacataa 2280 
atcaagtgtt aaaagtgctt tgattaaaca ggacaggcgc gttcttgagt tcatctgttc 2340 
acatactgtc tggcaagcgc tgactgaggg tctcctctgt accctgttct gagaactaac 2400 
aaaagacgaa tcaacataca gaaaactgtt atttagtgac tgattaaact aacgaaggca 2460 
tgggctggag aaataactca gcagttaaga acatttgctg atcttgcaga ggacctgggt 252 0 
tgggttccta gcacacacag ggacagttcc agtcccggtg tgtccttttc tcacttctgt 2580 
ggacacaagt tttacacata gtgcacacac atacactcac atatataaaa cagaacattt 2640 
aaaagtatgt ttaaataacg gaatcattta tataggtttt catttacata ggtaaatagg 2700 
caaaaatctg cattttattg tttctaagtt ttaatttatt tttctctgtg tgtacatacg 2760 
catgcctcct tatctgtatg tgcgtgcact gtgtgcatgc atgaacccac agagaccaga 2820 
agagtaccac agattctctg gagctggagt gattgatagg ctgttgggag ccactccaca 2880 
tggggattca gagttgaact tcgttctctg caagaacagc cagctcttaa ctgatggctt 2 94 0 
ttacctccag ccaccttttc ctcattttta aaatttcctt ccttcctttt tgagacaggg 3000 
tctcaatact tagctcatcc caacttgacc ccactcttct cttgccttag tcaccacaat 3 060 
gtttagttta taagcatgcg tcactatgcc cggctttaaa taaactcacc cataatccca 3120 
gcactgaagt agacaaaagg gaggatcgat ggggctgact ggccacaagc cgtgcttcaa 318 0 
gttcaatgaa gaccctgtct caagggaata aggcacagag gatagagcca tacgcctgac 3240 
ctcctcctct ggcctctacc caggcacatg tgcatacaca caccacacac acacacacac 3300 
acacacacac acacacacac agagagagag agagagagag agagagagag agagagagag 3360 
agagaaactt tttcctcttt ttttttaaaa atattattta tttcatgtat atgagtacac 3420 
tgttgctgtc ttcagacacc ccaaaagagg gcatcagatc ccattacaaa tggttgtgag 34 8 0 
ccaccatgtg gttgctggga attgaactca ggacctctgg aaaagcagtc agtgctctta 3540 
accatctctt cagccccaca aagaaacttt taatgagcaa. ataattgctt ccaagtaaat 3600 
actactaata tatttctaac catactatac aaggaattat taaagaacgg ataataggag 3660 
aataaaaaat tataagtcac tttataatgc tatctaatcc atctagaaca aaaacactgt 3 72 0 
aataatgcaa aagagcgcag tgcctagatt aaataaataa aatgcagacc aataagtaaa 3780 
ctttatagca gcacatggaa atgacgaaat tcctaacaaa aagctcaaga tgggcagttt 384 0 
atttaaagtg aaatacagga gaaataaagc acagaaagat actcaaaggc atagaagtta 3900 
acataggggg gctggcgaga tggctcaaca ggtaagagca cccgactgct cttccaaagg 3960 
tcctgagttc aaatcccagc aaccacatgg tggctcacaa ccatccgtaa tgagatttga 4020 
ctccctcttc tggtgtgtct gaagacagct acagtgtact taacatataa taaataaata 4080 
aaactttaaa aaaaaaatta aagaagttaa catagaagcc cactcaggac cccactcagt 4140 
cctagagtat gacattatta tggacattaa aaagagaaaa ttcagcagta gtgtgcatgc 4200 
actgcatata tacacaaatc cttgagtttc ataccaaatg cctttagacc acttgtggct 4260 
ctgcaaacct gtaatcctag cacttgtgaa aaggtcagct caggaacttt gggaaggtca 4320 
tgaaactctt gcccctccag aagggagagg ctaattaaca tttctcagac cacagggcgg 4380 
gaaccgacct gcgggtgggg acagactgtt gcccatttcc agactaggga agtccttgtc 444 0 
acctcattcc ctaaagacca atcaatttaa agggtgcact gttccgccaa tcatattgtg 4500 
cctagttgct gatgctctat tctgccctta gaaaccgtat aaaaactagc gaaggggtac 4560 
caggggtaac cccctctcct tcaggtctgg gacaatccca ctacactgga acaataaatt 4 62 0 
cctcttgctt tttgcattga tcacagctcc acttcgtggt aagctaagac tccctggagt 4680 
cttacattgg caaatgcagg caaaagaatc cgaaactcaa ggtcatctaa aactacatag 4740 
caagcatgct gctagcctgg gctccatgag accctggggg aggggcagag ggagaccgtt 4 80 0 
cagaagacag tcaagatgtt gcagcagcac aggcagcctg gccaccagtg ctgtcaccag 4860 
acatgttaat gttggaataa agcctcaatc atgactctcc cagttttata attggaaata 4920 
agaaaggaaa gactatagga acaactgtgt tcagaacact atttataata gcaaagatct 4980 
cagagtaacc caaacttcta gacattgatt tgggaagatc tcttggcagc ttattttgaa 5040 
aactttacaa tgttaaatat gtaaaaacaa ggacagtttt gttttttgtt ttgttttgtt 5100 
ttgttttagg gatatatatt catatatgta tatgaatgaa aacccaaact taaaattccc 5160 
cactatgctt taaaggcttt ctgacaataa cagaaagaga aatagagaat ccataaaaac 5220 
tagttctgaa actatcaata ggcttgacac tctttagctg ccaggagagc tgaatctgaa 5280 
cacagggaac cccacccagc accccaaatt tggattattg ttttatttta tctttcccct 534 0 
acccccaaga cagggtttct ctgtgtggtc ctggctgtcc tggaactcgg agatcctctg 5400 
cctctctgcc tctctgcctc tctctctctc tgcctctctc tgcctctctc tctctctctc 5460 
tctgcctctc tctctctctc tgcccctcgc tgcctctctc tgcctctctc tgcccctctc 5520 
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tgcccctctc tgcccctctc tcttcccctc tctgcctctc tctctgcccc tctctgcctc 5580 
tctgcctctg cctcctgagt gctgggattt aaaggcatca gccatcactt ccagcttcct 5640 
ttatcatttt aaaaagaatt tcctatgtga ctactgtatt taaatcacca cacggccaat 5700 
actccccccc aactcctccc aaatcccctc tacccactca aattcttatc ttgtattctt 5760 
tatcattatt atacatatgt gtatatatgt gtgtgtgtat atatatatat actatatact 5820 
gctaatgagt aacatttagt gttattcatt gttgcatgtt ttcaatgtgc tttccaggag 5 880 
gctgggggga tggctcagtg ggcaaaattc tagctgcaca agcctaagga ccagggttca 5940 
gatccccaat ataaaggctg gctggacatg gtggcttgcc tatgatacta gcatgcttgc 6000 
tggaagcaaa gacagggaat ccctggagac ttagaatctc agaagtgatc tgggctggac 6060 
agactagctg aactggccag ctctgggttc atcaagaaac cctacctcca taacataaag 612 0 
tgtgatggag aaaggcacct aatgtcaacc tcaaacccct acctgcatgt gcacacacat 6180 
acatccacac cacacacaca cacacacaca cacacacaca ccacacacac acacacacaa 624 0 
ataaataagt aaataaataa aatatttagc tctccagacc aaatcttggt gaaacccatg 63 00 
catttgcatt tgtgtgtgtc ctacaaacac tgaaggttaa gaagcatgct ccttagtaat 6360 
tttatagcag tttgcgtttc cagattgaaa acagattcta taggctacac. agtgctaaat 642 0 
ggattatgct cagatacaga ttgaaaagga tacagattga aaagggtcgg ggtctgggcc 6480 
aggatgacgg gccaactgat ctttgccggg gcttgtcctt cagggaaggg ttacaggatt 6540 
caccactggg gtgtggccta tctgctgtta ggacctgaat tgcctggagt gtttctagtt 6600 
cccactagtt gttgaacttt accttgaacc tctgctccca gggaagtcat caggactctg 6660 
ccatccctgg agtctctgca gaggttgttt gaccaacagc tctccccagg ccttcgccca 672 0 
cgacctcagg taaggggttt ggatttggaa agatgcaatt gctataggag ggactctgaa 6780 
ggcagacaga cgcaccgcct cctcacgttg gctagtctaa tataaacatc gcggtggatg 684 0 
gtgaggatag actccatgcc cttttgtgaa ggcatttcct ggcatcagct cctgacttca 6900 
gacagtttca cccatcagac aaattgcctg gtgttggagg aggaggtgag cagggccatt 6960 
cccatcattt ctcctcagaa atggaaaggc aaggaaaaca tgaggttctt cagacactta 7020 
atccctggga ctgcaaaatg gtggtgcccc tcctccacag ctgctcacgg cggggcagga 70 80 
gatgagggcc aaatgaagca tagatctagc tatttttttt ttagtgcctt cagtaaattt 7140 
aaaatcaaat aagggaaggg acctagatct ttatgttatg gcattgttaa aagtgagaac 7200 
ttgtagccag ggtgtggtgg cgcacacctt taatcccagc acttgggagg cagaggcagg 7260 
cggatttctg agtttgaggc cagcctggtc tacagagtga gttccaggac agccagggct 7320 
acacaggttt ctctatacag agaaacctgt ctcgaaaaac caagaaaaaa aagaaagaaa 7380 
aagaaagaaa gaaagaaaga aagaaagaaa ggaaggaagg .aaggaaggaa ggaagaaaga 7440 
ggacaacatg gtctaggggt cagagagcag aatctccaaa aacaccaaca atgcctgctg 7500 
taaatgtatg tcgttgattt ggggatgttg gcctccagct caccatttcc tgccttagcc 7560 
tccaaagtgc taggattata ggcttgagcc aacacatctg gcttacgcct attgtgtgtg 7620 
gaaggggagt gctgagtgtg ctcctgtgtt tggtacttat atatgaatat atgtatatac 7680 
gcatgtacgc atacttgcat gtgaaggcca gaggccaatg tcagctgctc tcatcttatc 7740 
ctttttatta cattgtattt atttgtttgt ttgtttgttt gtttcatcgt acgcatgcag 7800 
ccactcatga gcatgtaaca gcacaggtat gaaggtagac ttgcaggagt cagttctctc 7860 
cttctgtcac ttgagttcca ggaccacact ccagccccca ggcctgggct gtaagagagc 792 0 
catcttactg gtcctctact ttgtcttctg agatagcatc tagactcacg gaacctggag 7980 
ctcatctaga tttacattgg ctggccagct gatgcatttt aaggtcaaat cttcattcca 8040 
tccctacccc acttccactc ccagtgctgg agttcgggac acctgccacc aagcccagtt 8100 
tttcctggat gcagaagctc caaactcagg ttcccatgtt cgcatggcag gcacattttc 8160 
agttaagcct tccccccagc tcctttaccc tggtctctga atggggggga ggctataaat 8220 
caggctgctc tcagacatta ggtaggaaat agaccatata catgaggaaa gatattcacc 828 0 
tgccccatgg gtaccaggaa gtgatgtcca actcctcttt tgcttatcag gagaaatgct 8340 
gactactacc tctggtaatt ttgatgttgg gaggaacagg gacattcata ggaccccatt 8400 
cctcgctggt gagagtggag acaggttttc tgaagggcag gagatctgtg tagaaaagat 8460 
ggatgctgtt ttctgaaggg aaatggaggt agagtcgacc tgggagagag gggaggtggg 852 0 
ggatggttgg gaggaatgaa aggaagagag acggcagttg ggatgtattg tataagagaa 8580 
gaatcagaaa gaaaaaaaga aaagctacct gcacccttca agtgttcctc tgtgtgggag 8640 
gctgtctcag ggactacatg ggcaccgaga ggcatcagtg agggtaggta cttgatgttg 8700 
tgtccctgaa aacaaggaca ggaaatctgc tgcatggcct aagatggcaa aatgtggcac 8760 
aatcaagtaa ggcccaggat tctgtctgtg gtgcagacct gctgtagaat gagctcccag 882 0 
cattcccact tgctgtgtgg agacagcatg ttgcagagcc atgtgaggat gagggtccag 8880 
gccaggagga tgtcaaccca ccaccatgta gccagtgggc tgggggagct tgggcccacc 894 0 
aaggagcttg agcagactga cagtgggtta tgtacacaag tgggcgtgtc acacaaccgt 90 00 
gcaacacaga gaaaatccct gtgatgacaa cttctaaacc accctgaggc aaaaggagta 90 60 
gacaggggat tagagcctag catattggag tcgagtggcc atgcagctct tggaagcgtg 912 0 
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aggaaggaaa tttcctggaa ggataggttg tcttcctagc agcctcgtca atagatgtca 918 0 
atgtatgagg tagtacctgc tacaatcctg cttcttcaga agactgaggc agggg gat ta 924 0 
cttgaaccca gaagttctag gccagtatgg acaacatagc aagagtctga tttaaaaaaa 93 00 
aaaaaaagta aagagggaaa ccaaataggt gacgtgccac actagtgtct tcctgctcca 9360 
agggtcctgg gcacatgagc ttgcttagtg ccagaaaagt cagaggagga gagggcagac 942 0 
agagaccctc gtctccacct cctttgactg actaatgggg ctggatataa tctgttttac 948 0 
aaaaggacag cttttcagag ctgtttctat ctaaggttgc tctgaatagc catctcgaaa 954 0 
tatgccagag aagaatattt agaggcggcc tattttggtc tcccacaaag atttcacagg 9600 
gaaaatgtat ttgtgttcta tttatacaac taaaaatatg catcagcccg gggaaactgg 9660 
ctctttgctg cctttgaagt gaaggatggg ttaatttcta agaaagtaaa agcaaatgta 972 0 
gtgcaggcac cacggatgct gccagacacc agcgtttaag tggcttgaat ggaagagcac 9780 
accccaagta tttgaagtag gtgaggagag agaccaggtc tccagagttg ggcctgcagt 9840 
ggccagggta agtccaaggg gcagtatgac gcagaacaga gtggcaacct ctaccagtag 9900 
tagaattcag gtctcattcc taacctccca taaagcagag aatattgcac tctctttctc 9960 
tctctctctc tctctctctc tctctctctc tctctctctc tctctctaac tcacagagat 10020 
cctcctgcct ctgcttcctc attgctggaa ataaaagcat gtgccaccac atgcagctct 10080 
ccttgtttgg agagagaaag gcagacagac agacagacag acagacagac agagtataat 10140 
gtgtgcagat gtccacagaa ctccgaagag ggtgttggac ccccagaaac cggagttcta 102 00 
ggcagttgta agctgtccag tgggtgfctgc gaactgaact caagtcctct ggaaaactgg 10260 
aaagtactct taaccactaa gctatctgtt agtcccccaa gaatgtctta tcttgatagg 10320 
ccttcagatc tcacagtcca gggggctgac ctgctacaag gggtagagga aagaaaactg 10380 
gctccaggcc tgagttcaga gtgacatctc cagagttctt ccttccct'tg ccgtgtagta 10440 
atttcctacc ctacacgagg agaaaaggaa cagatatgtc cagtatgcct ggcatcttga 10500 
aagggcactg actcttgccg ctgtagagcc cctttccttg gggagtgcag agaagtgctg 10560 
ctagagaggt tcaaaagaaa cacaacagtc aaaatagttg ctggccaagg gagggcatgt 10620 
gccctgtatc gggcctgcaa agccctttca acagtgtagc caaggccatg tttgacagta 10680 
caagcctgta agcccatgca taagcaaggg ctgggataaa gggctactgt tcaagttagt 10740 
tatatacaca tcaagtttgt tcatcttaca tccttaggta agggtgggta gcttgttagc 10800 
tccacctctc cagcaggaaa gcatgtccac ggtaaggtag atactgtaga gtttagcttt 10860 
gttttgacct ttctttcttc gttgcttggg attgtacttg ggaccttcac tcatgcagcc 10920 
catgtttcag ccttgcataa atctataaat atatggttgg agtcaggtcc agcatatgtg 10980 
ggtaccagtc ccagtctggt atagcaattt attagctatt tgtctttaga caagtaacta 13 040 
tagggttctg agctgtaatg gccaccaaac aggtccttta agtatcttta ttggccaaac 11100 
accagggtta aaacactgaa cacagtatgg tggtttcaac tgtcatggag tccacattct 11160 
agcaggggaa ttttacacac aactcacaaa tgatacagac cacagcaggg tgatttgaca 1122 0 
gagacaacga agggcttgcc tgagctgcag tagtccggga gcagctcttg atgggatctg 112 80 
agatctagat ggcagagcgc ccttcctaga tgaaatccaa gcctctggtt actcctcagc 11340 
aagagtagct ggaggaagac cagaagtgag gaggaaaagg ctcagagtgg agcatctaag 11400 
cctgcagcac ctccagagac aggctttgga gcttgcctgt ggttcagcct actgtgggaa 11460 
gagaccagtg agggttttaa gctgtgggga tgggtatatt ttggaaggcg aacaagaaca 1152 0 
ccagcagctc aggcaatgta ggagggcggg aaggtgatgg atgatcgagg acggccccag 11580 
agattttaag gactgtgtag gtggggtgag aagcactgtg ggtgaggtgg agttagaggg 11640 
aagaagtgaa gatctatttt tttggccatg aggaatttgg ggtgcacaca cacacataca 11700 
tacacacccc atcatgtcct tgtctacaga cggggtgata atggtactgg aaggtagaag 11760 
gtggaaagag gaccagcaag gaaggtatcc agtgcccacc ccacagctca ccctcagaca 1182 0 
gaccttgttc tcattttcat tacccaggtg cccggagagg ccagtcccat caccatggtt 11880 
gccaaactca gccaattgac aagtctgctg tcttccattg aagataag 1192 8 



<210> 3 

<211> 560 

<212> DMA 

<213> Mus musculus 

<400> 3 

ctctgggttc atcaagaaac cctacctcca 
aatgtcaacc tcaaacccct acctgcatgt 
cacacacaca cacacacaca ccacacacac 
aatatttagc tctccagacc aaatcttggt 
ctacaaacac tgaaggttaa gaagcatgct 



taacataaag tgtgatggag aaaggcacct 60 
gcacacacat acatccacac cacacacaca 12 0 
acacacacaa ataaataagt aaataaataa 180 
gaaacccatg catttgcatt tgtgtgtgtc 240 
ccttagtaat tttatagcag tttgcgtttc 300 
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cagattgaaa acagattcta taggctacac agtgctaaat ggattatgct cagatacaga 360 
ttgaaaagga tacagattga aaagggtcgg ggtctgggcc aggatgacgg gccaactatc 420 
tttgcccggg cttgtccttc agggaagggt tacaggattc accactgggg tgtggcctat 48 0 
ctgctgttag gacctgaatt gcctggagtg tttctagttc ccactagttg ttgaacttta 540 
ccttgaacct ctgctcccag ~ 5( - 0 



<210> 4 

<211> 560 

<212> DMA 

<213> Homo sapiens 

<400> 4 

agcccgagga ttggagttgt caatggatgt gacaatggga aaatocgtct gagcctgcat 60 
ttgggctgct aggaggggat ttgcatcaga atccacagat caccagcact gggcagccct 12 0 
aatatttaaa atgcagattc tagactcaat caggcgggag cccagaaatt tgcattgtta 180 
acacctgtgt gtgtgtgtgt gtgtgtgtgt gtgtgtgtgt gtgttttata aacacagaag 240 
gttgggaacc atggataact aagtgaagtc attttgtcac tcagatttga attttctaca 300 
ggctatagag tgcagtttgg ctaaagcaaa acctaggtac agtcaggact acacaattcc 3 60 
agttcgctgt gggttgggaa gggatgggtg ggccagtgct ggcaagcctt gatctttgcc 420 
cgggcttgtc cttctgggga gaattacctg cttctgctgg actgagggtg ccctcatctc 4 80 
tggctagagc ccgtgctgcc atggaagact ctttccggtg cccactaatc cttgatgttc 540 
accttgtccc ctgcccccag rko 



<210> 5 

<211> 11539 

<212> DNA 

<213> Mus musculus 

<220> • 

<221> modif ied_base 
<222> (1847) 

<223> N = A, C, G OR T/U 
<400> 5 

gagctctaac tgagctggga aagttgatca tttgtttatt ccttttaggg tattgggggg 60 
gcacggatgt atgcatgctg gtatgtatgc atgctgggga tgtagggcct catgtgtgct 120 
aaatacatgt gcctattctg taatgttttc ttgtttgttt ttaactcaaa ttaattagag 180 
gcagtttctc tgttaaaaaa aaaaaggtgg aagacactgc catctcattt gtgttgggac 240 
ttgagatata tatatatata tatatatata tatatatata tatataattt atttatttat 300 
ttatttgtgt atgtgtgtat aagaaagcgg gggacactca tgtgccatgg tgtgtatggg 360 
gaeacacgtg tcctgtgggg gaacattgtg tcaggggaac aaacatgtgc tgagggtggg 42 0 
gggatatata tgtgccactg tggtgatttg gttgggggag acatgtatgt gctatgatac 480 
ctgtgtgtag gacagaggac acactcacac aggagccttc ttacctgctg agctgtctca 540 
agcagtccaa gmcctccatg gctgtagcca ctcttcctcc ttgctacagt cccacatcca 600 
sccaacacat aaaggctctg gcaggaaata aattaaatyt gctttgtgtg tgggtgctga 660 
cggctcaart cttccggtga tggtaatgtt taaarcaarc caamatcagt tctccagtgc 720 
cgaagttatt gaatgactga ccaatgggta acacttagga tttttaaaaa ttattttgat 780 
tgtacatttt gaattacatt aatcttaaaa taaactacaa acataaacac actgtggttg 84 0 
gagctgggca tagtggcaca tgcctttaat tccagaactt gggaggcaga ggctggcata 900 
tctctatgac tttgaggcca gcctggtcta tataatgagt tccaggacag agagatcctg 960 
tctcacaaac agagaaaacc ccataactat acttttattg tactggttgg tctacagtgt 102 0 
aaagaattgg caaagaatat gaaagattac actgggaaga aactgaaagc catccagagt 1080 
aatgaagcaa acccctcaca aatgtgggaa acatttcaca tgggtgaggg cttgcagcca 114 0 
gcctgtgcta atttatgttt tggtacgtga gcacttagag cagtttccag ttctctggtg 12 00 
ctctaccaat cttagcttgg ttaatattca gggatgagtc ttccatcacc ggtaatctaa 1260 
tttgccattc tatttaaagg ctcttaaagg cacaggcagt gcattgggta aatgtggcaa 1320 
ataatttctt tgacaaatcg accaattgtc agattggcct gctagtcatt tgtttcaatg 13 80 
agaactgttt tttctcaaag gatgctcttg tacaccggct agaagcaggg ctgtcatttt 1440 
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tataggtctc tgtggtattt tgttgttgtt gttctgttgt tgcaaatcat ttatcactga 1500 
gggaaaatac acacaaaggg ccctttcttt aaaagtatac atgtatcatt ttgtgacagc 1560 
tcataagaag ctgttttttt ctgcctggac acaggtcctg acctgtgctg tgtccttgct 1620 
aagctttgtc agacccttcc acagctcccc ccaacaacga gttccccagt acctgcctca 1680 
cctcatcact atggtgacag cagcctctga tgcgctgact ctctggcaca ttatggcagt 1740 
gttaaaagct tccatctctc tctttgctga ataatgaacc tcaggttgtt caagagaccg 1800 
gaatgttctt cacctgcctg cacacatctc ttcactttct tttatanatc aggtagggac 18 60 
tgggcgtgta gatggaacaa actgttttcc gttccccagc catctctgca ggtgcactcc 192 0 
acataaatca agtgttaaaa gtgctttgat taaacaggac aggcgcgttc ttgagttcat 1980 
ctgttcacat actgtctggc aagcgctgac tgagggtctc ctctgtaccc tgttctgaga 2040 
actaacaaaa gacgaatcaa catacagaaa actgttattt agtgactgat taaactaacg 2100 
aaggcatggg ctggagaaat aactcagcag ttaagaacat ttgctgatct tgcagaggac 2160 
ctgggttggg ttcctagcac acacagggac agttccagtc ccggtgtgtc cttttctcac 2220 
ttctgtggac acaagtttta cacatagtgc acacacatac actcacatat ataaaacaga 2280 
acatttaaaa gtatgtttaa ataacggaat catttatata ggttttcatt tacataggta 2340 
aataggcaaa aatctgcatt ttattgtttc taagttttaa tttatttttc tctgtgtgta 2400 
catacgcatg cctccttatc tgtatgtgcg tgcactgtgt gcatgcatga acccacagag 2460 
accagaagag taccacagat tctctggagc tggagtgatt gataggctgt tgggagccac 2520 
tccacatggg gattcagagt tgaacttcgt tctctgcaag aacagccagc tcttaactga 2580 
tggcttttac ctccagccac cttttcctca tttttaaaat ttccttcctt cctttttgag 2640 
acagggtctc aatacttagc tcatcccaac tbgaccccac tcttctcttg ccttagtcac 2700 
cacaatgttt agtttataag catgcgtcac tatgcccggc tttaaataaa ctcacccata 2760 
atcccagcac tgaagtagac aaaagggagg atcgatgggg ctgactggcc acaagccgtg 2 820 
cttcaagttc aatgaagacc ctgtctcaag ggaataaggc acagaggata gagccatacg 2880 
cctgacctcc tcctctggcc tctacccagg cacatgtgca tacacacacc acacacacac 2940 
acacacacac acacacacac acagagagag agagagagag agagagagag agagagagag 3 000 
agagagagaa actttttcct cttttttttt aaaaatatta tttatttcat gtafcatgagt 3 060 
acactgttgc tgtcttcaga caccccaaaa gaaggcatca gatcccattg acaaatggtt 312 0 
gtgagccacc atgtggttgc tgggaattga actcaggacc tctggaaaag cagtcagtgc 3180 
tcttaaccat ctcttcagcc ccacaaagaa acttttaatg agcaaataat tgcttccaag 3240 ■ 
taaatactac taatatattt ctaaccatac tatacaagga attattaaag aacggataat 3300 
aggagaataa aaaattataa gtcactttat aatgctatct aatccatcta gaacaaaaac 3360 
actgtaataa tgcaaaagag cgcagtgcct agattaaata aataaaatgc agaccaataa 3420 
gtaaacttta tagcagcaca tggaaatgac gaaattccta acaaaaagct caagatgggc 3480 
agtttattta aagtgaaata caggagaaat aaagcacaga aagatactca aaggcataga 3540 
agttaacata ggggggctgg cgagatggct ctacaggtaa gagcacccga ctgctcttbc 3 60 0 
aaaggtcctg agttcaaatc ccagcaacca catggtggct cacaaccatc cgtaatgaga 3660 
tttgactccc tcttctggtg tgtctgaaga cagctacagt gtacttaaca tataataaat 372 0 
aaataaaact ttaaaaaaaa aattaaagaa gttaacatag aagcccactc aggaccccac 3 78 0 
tcagtcctag agtatgacat tattatggac attaaaaaga gaaaattcag cagtagtgtg 3 840 
catgcactgc atatatacac aaatccttga gtttcatacc aaatgccttt agaccacttg 3900 
tggctctgca aacctgtaat cctagcactt gtgaaaaggt cagctcagga actttgggaa 3 960 
ggtcatgaaa ctcttgcccc tccagaaggg agaggctaat taacatttct cagaccacag 4 02 0 
ggcgggaacc gacctgcggg tggggacaga ctgttgccca tttccagact agggaagtcc 40 8 0 
ttgtcacctc attccctaaa gaccaatcaa tttaaagggt gcactgttcc gccaatcata 4140 
ttgtgcctag ttgctgatgc tctattctgc ccttagaaac cgtataaaaa ctagcgaagg 4200 
ggtaccaggg gtaaccccct ctccttcagg tctgggacaa tcccactaca ctggaacaat 4260 
aatttcctct ggctttttgc attgatcaca gctccacttc gtggtaagtt aagactccct 4320 
ggagtcttac attggcaaat gcaggcaaaa gaatccgaaa ctcaaggtca tctaaaacta 4380 
catagcaagc atgctgctag cctgggctcc atgagaccct gggggagggg cagagggaga 444 0 
ccgttcagaa gacagtcaag atgttgcagc agcacaggca gcctggccac cagtgctgtc 4500 
accagacatg ttaatgttgg aataaagcct caatcatgac tctcccagtt ttataattgg 4560 
aaataagaaa ggaaagacta taggaacaac tgtgttcaga acactattta taatagcaaa 4620 
gatctcagag taacccaaac ttctagacat tgatttggga agatctcttg gcagcttatt 468 0 
ttgaaaactt tacaatgtta aatatgtaaa aacaaggaca gttttgtttt ttgttttgtt 4 740 
ttgttttgtt ttagggatat atattcatat atgtatatga atgaaaaccc aaacttaaaa 4800 
ttccccacta tgctttaaag gctttctgac aataacagaa agagaaatag agaatccata 4 860 
aaaactagtt ctgaaactat caataggctt gacactcttt agctgccagg agagctgaat 4 920 
ctgaacacag ggaaccccac ccagcacccc aaatttggat tattgtttta ttttatcttt 4980 
cccctacccc caagacaggg tttctctgcg tggtcctggc tgtcctggaa ctcggagatc 5 040 
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ctctgcctct ctgcctctct gcctctctct 
ctctctctgc ctctctctct ctctctgccc 
ctctctgccc ctctctgccc ctctctcttc 
gcctctctgc ctctgcctcc tgagtgctgg 
ttcctttatc attttaaaaa gaatttccta 
ccaatactcc cccccaactc ctcccaaatc 
ttctttatca ttattataca tatgtgtata 
atactgctaa tgagtaacat ttagtgttat 
aggaggctgg ggggatggct cagtgggcaa 
gttcagatcc ccaatataaa ggctggctgg 
cttgctggaa gcaaagacag ggaatccctg 
tggacagact agctgaactg gccagctctg 
taaagtgtga tggagaaagg cacctaatgt 
cacatacatc cacaccacac acacacacac 
cacaaataaa taagtaaata aataaaatat 
ccatgcattt gcatttgtgt gtgtcctaca 
gtaattttat agcagtttgc gtttccagat 
taaatggatt atgctcagat acagattgaa 
gggccaggat gacgggccaa ctatctttgc 
gattcaccac tggggtgtgg cctatctgct 
agttcccact agttgttgaa ctttaccttg 
tctgccatcc ctggagtctc tgcagaggtt 
cccacgacct caggtaaggg gtttggattt 
tgaaggcaga cagacgcacc gcctcctcac 
gatggtgagg atagactcca tgcccttttg 
ttcagagttt cacccatcag acaaattgcc 
ttcccatcat ttctcctcag aaatggaaag 
taatccctgg gactgcaaaa tggtggtgcc 
gagatgaggg ccgaatgaag catagatcta 
ttaaaatcaa ataagggaag ggacctagat 
acttgtagcc agggtgtggt ggcgcacacc 
ggcggatttc tgagtttgag gccagcatgg 
ctacacaggt ttctctatac agagaaacct 
aaaagaaaga aagaaagaaa gaaagaaaga 
gaaggaagga aggaagaaag aggacaacat 
aacaccaaca atgccttgct gtaaatgtat 
tcaccatttc ctgccttagc ttccaaagtg 
ggcttacgct tattgtgtgt ggaaggggag 
tatatgaata tatgtatata cgcatgtacg 
gtcagctgct ctcatcttat cctttttatt 
tgtttcatcg tacgcatgca gccactcatg 
cttgcaggag tcagttctct ccttctgtca 
aggcctgggc tgtaagagag ccatcttact 
ctagactcac ggaacctgga gctcatctag 
taaggtcaaa tcttcattcc atccctaccc 
cacctgccac caagcccagt ttttcctgga 
tcgcatggca ggcacatttt cagttaagcc 
aatggggggg aggctataaa tcaggctgct 
acatgaggaa agatattcac ctgccccatg 
ttgcttatca ggagaaatgc tgactactac 
ggacattcat aggaccccat tcctcgctgg 
ggagatctgt gtagaaaaga tggatgctgt 
ctgggagaga ggggaggtgg gggatggttg 
gggatgtatt gtataagaga agaatcagaa 
aagtgttcct ctgtgtggga ggctgtctca 
gagggtaggt acttgatgtt gtgtccctga 
taagatggca aaatgtggca caatcaagta 
tgctgtagaa tgagctccca gcattcccac 
catgtgagga tgagggtcca ggccaggagg 
ctgggggagc ttgggcccac caaggagctt 



ctctctgcct ctctctgcct ctctctctct 510 0 
ctcgctgcct ctctctgcct ctctctgccc 5160 
ccctctctgc ctctctctct gcccctctct 5220 
gatttaaagg catcagccat cacttccagc 5280 
tgtgactact gtatttaaat caccacacgg 5340 
ccctctaccc actcaaattc ttatcttgta 5400 
tatgtgtgtg tg.tatatata tatatactat 5460 
tcattgttgc atgttttcaa tgtgctttcc 5520 
aattctagct gcacaagcct aaggaccagg 5580 
acatggtggc ttgcctatga tactagcatg 5640 
gagacttaga atctcagaag tgatctgggc 5700 
ggttcatcaa gaaaccctac ctccataaca 5760 
caacctcaaa cccctacctg catgtgcaca 5820 
acacacacac acacaccaca cacacacaca 5880 
ttagctctcc agaccaaatc ttggtgaaac 5940 
aacactgaag gttaagaagc atgctcctta 60 00 
tgaaaacaga ttctataggc tacacagtgc 6060 
aaggatacag attgaaaagg gtcggggtct 612 0 
ccgggcttgt ccttcaggga agggttacag 6180 
gttaggacct gaattgcctg gagtgtttct 6240 
aacctctgct cccagggaag tcatcaggac 63 00 
gtttgaccaa cagctttccc caggccttcg 6360 
ggaaagatgc aattgctata ggagggactc 6420 
gttggctagt ctaatataaa catcgcggtg 6480 
tgaaggcatt tcctggcatc agctcctgac 6540 
tggtgttgga ggaggaggtg agcagggcca 6600 
gcaaggaaaa catgaggttc ttcagacact 6660 
cctcctccac agctgctcac ggcggggcag 6720 
gctatttttt ttttagtgcc ttcagtaaat 6780 
ctttatgtta tggcattgtt aaaagtgaga 6840 
tttaatccca gcacttggga ggcagaggca 6900 
tctacagagt gagttccagg acagccaggg 6960 
gtctcgaaaa accaagaaaa aaaagaaaga 7020 
aagaaagaaa gaagaaagaa aggaaggaag 7080 
ggtctagggg tcagagagca gatctccaaa 7140 
gtcgttgatt tggggatgtt ggcctccagc 7200 
ctaggattat aggcttgagc caacacatct 7260 
tgctgagtgt gctcctgtgt ttggtactta 732 0 
catacttgca tgtgaaggcc agaggccaat 73 80 
acattgtatt tatttgtttg tttgtttgtt 7440 
agcatgtaac agcacaggta tgaaggtaga 7500 
cttgagttcc aggaccacac tccagccccc 7560 
ggtcctctac tttgtcttct gagatagcat 7620 
atttacattg gctggccagc tgatgcattt 768 0 
cacttccact cccagtgctg gagttcggga 7740 
tgcagaagct ccaaactcag gttcccatgt 7800 
ttccccccag ctcctttacc ctggtctctg 7860 
ctcagacatt aggtaggaaa tagaccatat 7920 
ggtaccagga agtgatgtcc aactcctctt 7980 
ctctggtaat tttgatgttg ggaggaacag 8 040 
tgagagtgga gacaggtttt ctgaagggca 8100 
tttctgaagg gaaatggagg tagagtcgac 8160 
ggaggaatga aaggaagaga gatggcagtt 822 0 
agaaaaaaag aaaagctacc tgcacccttc 8280 
gggactacat gggcaccgag aggcatcagt 834 0 
aaacaaggac aggaaatctg ctgcatggcc 8400 
aggcccagga ttctgtctgt ggtgcagacc 8460 
ttgctgtgtg gagacagcat gttgcagagc 8520 
atgtcaaccc accaccatgt agccagtggg 8580 
gagcagactg acagtgggtt atgtacacaa 8640 
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gtgggcgtgt cacacaaccg tgcaacacag agaaaatccc tgtgatgaca acttctaaac 8700 
caccctgagg caaaaggagt agacagggga ttagagccta gcatattgga gtcgagtggc 8760 
catgcagctc ttggaagcgt gaggaaggaa atttcctgga aggataggtt gtcttcctaq 8820 
cagcctcgtc aatagatgtc aatgtatgag gtagtacctg ctacaatcct gcttcttcag 8880 
aagactgagg cagggggatt acttgaaccc agaagttcta ggccagtatg gacaacatag 8940 
caagagtctg atttaaaaaa aaaaaaaaag taaagaggga aaccaaatag gtgacgtgcc 9000 
acactagtgt cttcctgctc caagggtcct gggcacatga gcttgcttag tgccagaaaa 9060 
gtcagaggag gagagggcag acagagaccc tcgtctccac ctcctttgac tgactaatgq 912 0 
ggctggatat aatctgtttt acaaaaggac agcttttcag agctgtttct atctaaggtt 9180 
gctctgaata gccatctcga aatatgccag agaagaatat ttagaggcgg cctattttgg 924 0 
tctcccacaa agatttcaca gggaaaatgt atttgtgttc tatttataca actaaaaata 93 00 
tgcatcagcc cggggaaact ggctctttgc tgcctttgaa gtgaaggatg ggttaatttc 93 60 
taagaaagta aaagcaaatg tagtgcaggc accacggatg ctgccagaca ccagcgttta 942 0 
agtggcttga atggaagagc acaccccaag tatttgaagt aggtgaggag agagaccagg 94 80 
tctccagagt tgggcctgca gtggccaggg taagtccaag gggcagtatg acgcagaaca 9540 
gagtggcaac ctctaccagt agtagaattc aggtctcatt cctaacctcc cataaagcag 9600 
agaatattgc actctctttc tctctctctc tctctctctc tctctctctc tctctctctc 9660 
tctctctctc tctaactcac agagatcctc ctgcctctgc ttcctcattg ctggaaataa 9720 
aagcatgtgc caccacatgc agctctcctt gtttggagag agaaaggcag acagacagac 978 0 
agacagacag acagacagag tataatgtgt gcagatgtcc acagaactcc gaagaggqtg 9840 
tgggaccccc agaaaccgga gttctaggca gttgtaagct gtccagtggg tgttgcgaac 9900 
tgaactcaag tcctctggaa aactggaaag tactcttaac cactaagcta tctgttagtc 9960 
ccccaagaat gtcttatctt gataggcctt cagatctcac agtccagggg gctgactgct 10020 
acaaggggta gaggaaagaa aactggcccc aggcctgagt tcagagtgac atctccagag .10080 
ttcttccttc ccttgccgtg tagtaatttc ctaccctaca cgaggagaaa aggaacagat 10140 
atgtccagta tgcctggcat cttgaaaggg cactgaptct tgccgctgta gagccccttt 102 00 
ccttggggag tgcagagaag tgctgctaga gaggttcaaa agaaacacaa cagtcaaaat 10260 
agttgctggc caagggaggg catgtgccct gtatcgggcc tgcaaagccc tttcaacagt 10320 
gtagccaagg ccatgtttga cagtacaagc ctgtaagccc atgcataagc aagggctggg 10380 
ataaagggct actgttcaag ttagttatat acacatcaag tttgttcatc ttacatcctt 10440 
aggtaagggt gggtagcttg ttagctccac ctctccagca ggaaagcatg tccacggtaa 10500 
ggtagatact gtagagttta gctttgtttt gacctttctt tottcgttgc ttgggattgt 10560 
acttgggaac ttcactcatg cagcccatgt ttcagccttg cataaatcta taaatatatg 10620 
gttggagtca ggtccagcat atgtgggtac cag.tcccagt ctggtatagc aatttattag 10680 
ctatttgtct ttagacaagt aactataggg ttctgagctg taatggccac caaacaggtc 10740 
ctttaagtat ctttattggc caaacaccag ggttaaaaca ctgaacacag tatggtggtt 10800 
tcaactgtca tggagtccac attctagcag gggaatttta cacacaactc acaaatgata 10860 
cagaccacag cagggtgatt tgacagagac aacgaagggc ttgcctgagc tgcagtagtc 10920 
cgggagcagc tcttgatggg atctgagatc tagatggcag agcgcccttc ctagatgaaa 10980 
tccaagcctc tggttactcc tcagcaagag tagctggagg aagaccagaa gtgaggagga 1104 0 
aaaggctcag agtggagcat ctaagcctgc agcacctcca gagacaggct ttggagcttg 11100 
cctgtggttc agcctactgt gggaagagac cagtgagggt tttaagctgt ggggatgggt 11160 
lllrllt 99 ? aggcgaacaa gaacaccagc agctcaggca atgtaggagg gcgggaaggt 1122 0 
gatggatgat cgaggacggc cccagagatt ttaaggactg tgtaggtggg gtgagaagca 11280 
ctgtgggtga ggtggagtta gagggaagaa gtgaagatct atttttttgg ccatgaggaa 1134 0 
tttggggtgc acacacacac atacatacac accccatcat gtccttgtct acagacgggg 11400 
tgataatggt actggaaggt agaaggtgga aagaggacca gcaaggaagg tatccagtgc 11460 
ccaccccaca gctcaccctc agacagacct tgttctcatt ttcattaccc aggtgcccgg 1152 0 
agaggccagt cccatcacc a y U9 



<210> 6 
<211> 22 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 



36 



WO 2005/090559 



PGT/US2005/008977 



<400> 6 

atctttgccc ggggcttgtc ct 

<210> 7 
<211> 50 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 7 

gtctgggcca ggatgacggg ccaactatct ttgcccgggc ttgtccttca 

<210> 8 
<211> 12 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 8 
cactagaagg tt 

<210> 9 
<211> 13 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 9 

gtgcgtgggc cag 

<210> 10 
<211> 10 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 10 
tcagggagag 

<210> 11 
<211> 12 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 11 
gtgcgcctat ct 



<210> 12 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 12 

tactcctcag caagagtagc tgg 

<210> 13 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 13 

gctgaacttg tggccgttta cgt 

<210> 14 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 14 

cttctatagc cttcccaagc c 

<210> 15 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 15 

ctcgtaggtc tcacaggaag 
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<210> 16 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 16 

cctgctggat tacattaaag cactg 



<210> 17 
<211> 25 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 17 

gtcaagggca tatccaacaa caaac 



<210> 18 
<211> 24 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial 
Primer 

<400> 18 

ggcgttctct ttggaaaggt gttc 

<210> 19 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial 
Primer 

<400> 19 

ctcgaaccac atccttctct 

<210> 20 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial 



Sequence : Synthetic 



24 



Sequence : Synthe tic 



20 



Sequence : Synthetic 
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Primer 
<400> 20 

ttgctgcacg agggctcaga ate 



<210> 21 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 21 

tccgattctc atgctctggc ttg 



<210> 22 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 22 

cagccctgtc tttgccacgt ttg 



<210> 23 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 23 

tccactggat tcatcccgct ctg 



<210> 24 
<211> 23 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 24 

cttcctcttg caacagagaa ccc 



23 



23 



23 



23 



23 
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<210> 25 
<211> 23 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Synthetic 
Primer 

<400> 25 

actcaacgtc cactttgaga tgc 
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